Minireviews
Copyright ©The Author(s) 2025.
Artif Intell Med Imaging. Jun 8, 2025; 6(1): 107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Table 3 Challenges and proposed solutions in visual language model -based ultrasound report generation
Challenge
Proposed solution
Poor accuracy in text generation related to measurement resultsExtract numerical values from ultrasound images using tools like TrOCR[15] and insert them into the report
Suboptimal handling of correspondence between text and imagesAnnotate the correspondence between text and images and design mechanisms to learn these relationships
Ineffective utilization of report templatesUse report templates as input, treat template prediction as an intermediate task, or have the model learn to modify templates
Issues with training data volumeSplit existing reports into text-image pairs and reassemble them to create pseudo-cases for training
Ineffective utilization of historical reportsUse historical reports along with current ultrasound images as input
Neglect of image selection taskExplicitly model the image selection process to choose representative images for the report
Lack of utilization of ultrasound-related expertiseFine-tune LLM models to learn this prior knowledge
Lack of exploration of predictive tasksConduct in-depth research on ultrasound examination scenarios to define effective predictive tasks