Minireviews
Copyright ©The Author(s) 2025.
Artif Intell Med Imaging. Jun 8, 2025; 6(1): 107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Table 1 Comparison of artificial intelligence models for ultrasound report generation
Method
Architectural features
Clinical relevance
CNN-LSTMCombines CNN and LSTM, suitable for processing sequential dataPerforms well in handling image and sequence information, applicable for ultrasound image analysis
Transformer-based modelsBased on self-attention mechanisms, capable of capturing long-range dependencies, suitable for parallel processingExcels in generating natural language reports, suitable for complex ultrasound report generation
VLMsIntegrates visual and linguistic information, capable of understanding image content and generating related textOutstanding performance in multimodal learning, enhances the accuracy and clinical relevance of ultrasound reports
Table 2 Key concepts in ultrasound report generation
Concept
Description
Significance
AI-assisted ultrasound report generationTechnology using AI to convert ultrasound imaging into structured diagnostic reportsEnhances efficiency, accuracy, and consistency of diagnosis
VLMsAI models that integrate visual (images) and linguistic (text) informationEnable understanding of image content and generation of descriptive text
Image encoderA component of VLMs that encodes image informationTransforms images into a format that the model can process
Text encoderA component of VLMs that encodes text informationTransforms text into a format that the model can process
Attention mechanismA technique that allows the model to focus on specific parts of the input (image or text)Improves the model's ability to focus on important image regions and text
LLMsTransformer-based models pre-trained on large text corporaEnhance the quality and fluency of generated text
Table 3 Challenges and proposed solutions in visual language model -based ultrasound report generation
Challenge
Proposed solution
Poor accuracy in text generation related to measurement resultsExtract numerical values from ultrasound images using tools like TrOCR[15] and insert them into the report
Suboptimal handling of correspondence between text and imagesAnnotate the correspondence between text and images and design mechanisms to learn these relationships
Ineffective utilization of report templatesUse report templates as input, treat template prediction as an intermediate task, or have the model learn to modify templates
Issues with training data volumeSplit existing reports into text-image pairs and reassemble them to create pseudo-cases for training
Ineffective utilization of historical reportsUse historical reports along with current ultrasound images as input
Neglect of image selection taskExplicitly model the image selection process to choose representative images for the report
Lack of utilization of ultrasound-related expertiseFine-tune LLM models to learn this prior knowledge
Lack of exploration of predictive tasksConduct in-depth research on ultrasound examination scenarios to define effective predictive tasks