Copyright
©The Author(s) 2025.
Artif Intell Med Imaging. Jun 8, 2025; 6(1): 107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Published online Jun 8, 2025. doi: 10.35711/aimi.v6.i1.107069
Table 1 Comparison of artificial intelligence models for ultrasound report generation
Method | Architectural features | Clinical relevance |
CNN-LSTM | Combines CNN and LSTM, suitable for processing sequential data | Performs well in handling image and sequence information, applicable for ultrasound image analysis |
Transformer-based models | Based on self-attention mechanisms, capable of capturing long-range dependencies, suitable for parallel processing | Excels in generating natural language reports, suitable for complex ultrasound report generation |
VLMs | Integrates visual and linguistic information, capable of understanding image content and generating related text | Outstanding performance in multimodal learning, enhances the accuracy and clinical relevance of ultrasound reports |
Table 2 Key concepts in ultrasound report generation
Concept | Description | Significance |
AI-assisted ultrasound report generation | Technology using AI to convert ultrasound imaging into structured diagnostic reports | Enhances efficiency, accuracy, and consistency of diagnosis |
VLMs | AI models that integrate visual (images) and linguistic (text) information | Enable understanding of image content and generation of descriptive text |
Image encoder | A component of VLMs that encodes image information | Transforms images into a format that the model can process |
Text encoder | A component of VLMs that encodes text information | Transforms text into a format that the model can process |
Attention mechanism | A technique that allows the model to focus on specific parts of the input (image or text) | Improves the model's ability to focus on important image regions and text |
LLMs | Transformer-based models pre-trained on large text corpora | Enhance the quality and fluency of generated text |
Table 3 Challenges and proposed solutions in visual language model -based ultrasound report generation
Challenge | Proposed solution |
Poor accuracy in text generation related to measurement results | Extract numerical values from ultrasound images using tools like TrOCR[15] and insert them into the report |
Suboptimal handling of correspondence between text and images | Annotate the correspondence between text and images and design mechanisms to learn these relationships |
Ineffective utilization of report templates | Use report templates as input, treat template prediction as an intermediate task, or have the model learn to modify templates |
Issues with training data volume | Split existing reports into text-image pairs and reassemble them to create pseudo-cases for training |
Ineffective utilization of historical reports | Use historical reports along with current ultrasound images as input |
Neglect of image selection task | Explicitly model the image selection process to choose representative images for the report |
Lack of utilization of ultrasound-related expertise | Fine-tune LLM models to learn this prior knowledge |
Lack of exploration of predictive tasks | Conduct in-depth research on ultrasound examination scenarios to define effective predictive tasks |
- Citation: Zeng JH, Zhao KK, Zhao NB. Artificial intelligence assisted ultrasound report generation. Artif Intell Med Imaging 2025; 6(1): 107069
- URL: https://www.wjgnet.com/2644-3260/full/v6/i1/107069.htm
- DOI: https://dx.doi.org/10.35711/aimi.v6.i1.107069