Published online Aug 21, 2025. doi: 10.3748/wjg.v31.i31.109948
Revised: June 28, 2025
Accepted: July 25, 2025
Published online: August 21, 2025
Processing time: 83 Days and 18.9 Hours
Gastrointestinal diseases have complex etiologies and clinical presentations. An accurate diagnosis requires physicians to integrate diverse information, including medical history, laboratory test results, and imaging findings. Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information, resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.
To develop and evaluate a collaborative multimodal large language model (LLM) framework for clinical decision-making in digestive diseases.
In this observational study, DeepGut, a multimodal LLM collaborative diagnostic framework, was developed to integrate four distinct large models into a four-tiered structure. The framework sequentially accomplishes multimodal infor
The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance, with a diagnostic accuracy of 97.8%, diagnostic completeness of 93.9%, treatment plan accuracy of 95.2%, and treatment plan completeness of 98.0%, significantly surpassing the capabilities of single-modal LLM-based diagnostic tools. Experts evaluating the framework commended the completeness, relevance, and logical coherence of its outputs. However, the collaborative multimodal LLM approach resulted in increased input and output token counts, leading to higher computational costs and extended diagnostic times.
The framework achieves successful integration of multimodal diagnostic data, demonstrating enhanced performance enabled by multimodal LLM collaboration, which opens new horizons for the clinical application of artificial intelligence-assisted technology.
Core Tip: This study introduces DeepGut, a multimodal large language model (LLM) collaborative framework designed to assist in diagnostic processes by integrating multiple LLMs to extract and fuse multimodal clinical data such as medical history, laboratory tests, and imaging results. DeepGut significantly improves the diagnostic accuracy and comprehensiveness of gastrointestinal diseases compared with single-modal tools, as evidenced by expert validation. However, the framework’s higher token consumption by LLMs increases the operational costs, highlighting a key area for future optimization efforts.