Copyright
©The Author(s) 2025.
World J Gastrointest Surg. Aug 27, 2025; 17(8): 109463
Published online Aug 27, 2025. doi: 10.4240/wjgs.v17.i8.109463
Published online Aug 27, 2025. doi: 10.4240/wjgs.v17.i8.109463
Table 5 Artificial intelligence applications in endoscopic diagnosis and quality control for gastrointestinal cancers
Ref. | Cancer type/lesion | Endoscopic modality | AI method/model | Task/objective | Dataset size/source | Performance metrics | Clinical relevance/impact |
Messmann et al[35], 2022 | BERN | Upper GI endoscopy | AI assisted deep learning systems | Real time detection and localization of Barrett’s neoplasia | Meta analyzer and real time work (n > 1000 images) | Sensitivity: 83.7%-95.4%, accuracy: 88%-96% | Improves detection of subtle lesions; supports targeted biopsies over Seattle protocol |
Choi et al[36], 2022 | EGD | CNN (squeeze and excitation network) | Classification of anatomical landmarks and completeness of photo documentation | 2599 images from 250 EGD procedures (Korea University Hospital) | Landmark classification: Accuracy 97.58%, sensitivity 97.42%, specificity 99.66%; completeness detection: Accuracy 89.20%, specificity 100% | Enhances quality control in EGD by verifying complete anatomical documentation automatically | |
Inaba et al[37], 2024 | Colonoscopy (preparation phase) | MobileNetV3 based CNN (smartphone app) | AI based stool image classification to assess bowel preparation quality | 1689 images from 121 patients; 106 patient prospective validation | Accuracy: 90.2% (grade 1), 65.0% (grade 2), 89.3% (grade 3); BBPS ≥ 6 in 99.0% of app users | Improved bowel prep monitoring; 100% cecal intubation; reduced burden on patients and nurses | |
Zhang et al[14], 2023 | Suspected choledocholithiasis | Not applicable (pre-endoscopy prediction) | ModelArts AI platform (Huawei); 7 machine learning models also tested | Predictive classification of CBD stones before cholecystectomy | 1199 patients with symptomatic gallstones; retrospective, single center | ModelArts AI: Accuracy 0.97, recall 097, precision 0.971, F1 score 0.97 | May outperform guideline based risk stratification; reduces unnecessary ERCP |
Wu et al[38], 2021 | EGC | EGD | ENDOANGEL system (CNNs + deep reinforcement learning) | Real time monitoring of blind spots and detection of EGC | 1050 patients in multicenter RCT; 196 gastric lesions biopsied | Accuracy: 84.7%, sensitivity: 100%, specificity: 84.3% | Reduced blind spots, improved EGD quality, potential for real time EGC detection in clinical setting |
Rondonotti et al[39], 2023 | DRSPs ≤ 5 mm | Colonoscopy with blue light imaging | CAD EYE (Fujifilm, Tokyo, Japan), CNN based real time system | Optical diagnosis to support “resect and discard” strategy | 596 DRSPs in 389 patients, 4 center prospective study (Italy) | NPV: 91.0%, sensitivity: 88.6%, specificity: 88.1%, accuracy: 88.4% | Meets ASGE PIVI thresholds; may enable safe omission of histology in DRSPs, especially beneficial for nonexperts |
Koh et al[40], 2023 | Colonic adenomas including SSA | Colonoscopy | GI Genius™ (CADe system, Medtronic, MN, United States) | Real time detection of colonic polyps and ADR improvement | 298 colonoscopies; 487 AI “hits”; 250 polyps removed | Post AI ADR: 30.4% vs baseline 243% (P = 0.02); SSA rate: 5.6% | Enhanced ADR even in experienced endoscopists; improved SSA detection; supports AI use in routine colonoscopy |
Yuan et al[41], 2022 | Gastric lesions (EGC, AGC, SMT, polyp, PU, erosion) | White light endoscopy | YOLO based DCNN model | Multiclass diagnosis of six gastric lesions + lesion free mucosa | 31388 images (29809 train/1579 test) from 9443 patients | Overall accuracy: 85.7%; EGC: Sensitivity 59.2%, specificity 99.3%; AGC: Sensitivity 100%, specificity 98.1% | Comparable to senior endoscopists; improved diagnostic accuracy and efficiency; potential for real time support in diverse gastric lesion detection |
Munir et al[42], 2024 | Not applicable (survey based assessment) | ChatGPT | Evaluation of AI responses to perioperative GI surgery questions | 1080 responses assessed by 45 surgeons | Majority graded “fair” or “good” (57.6%); highest “very good/excellent” rate for cholecystectomy (45.3%) | ChatGPT may aid in patient education, but only 20% deemed it accurate; limited utility in reducing message load | |
Sudarevic et al[43], 2023 | Colorectal polyps | Colonoscopy | Poseidon system (EndoMind + waterjet based AI) | AI based in situ measurement of polyp size using waterjet as reference | 28 polyps in silicone model + 29 polyps in routine colonoscopies | Median error: Poseidon 7.4% (model), 7.7% (clinical); visual: 25.1%/22.1%; forceps: 20.0% | Significantly improved sizing accuracy; does not require additional tools; useful for clinical polyp surveillance and resection decisions |
Tsuboi et al[44], 2020 | Small bowel angioectasia | Capsule endoscopy (PillCam SB2/SB3, Medtronic, MN, United States) | CNN (single shot multibox detector) | Automatic detection of angioectasia in CE images | 2237 training images, 10488 validation images (488 angioectasia, 10000 normal) | AUC: 0.998; sensitivity: 98.8%, specificity: 98.4%, PPV: 75.4%, NPV: 99.9% | Enables high accuracy detection of angioectasia; may reduce oversight and physician workload during capsule reading |
Chang et al[45], 2022 | Upper GI endoscopy (EGD) | ResNeSt deep learning model | Evaluate photodocumentation completeness via anatomical classification | 15305 training images; 15723 test images from 472 EGD cases | Accuracy: 96.64% (deep learning model), Photodocumentation rate: 78% (esophagus duodenum), 53.8% (pharynx duodenum) | Enables automated auditing of image completeness; higher completeness linked to higher ADR; applicable for routine EGD quality control | |
Hwang et al[46], 2021 | Small bowel hemorrhagic and ulcerative lesions | CE | VGGNet based CNN + Grad CAM | Classification and localization of hemorrhagic vs ulcerative lesions | 30224 abnormal + 30224 normal images (train); 5760 images (validation) | Combined model: Accuracy 96.83%, sensitivity 97.61%, specificity 96.04%, AUROC approximately 0.996 | Enhanced lesion localization without manual annotation; Grad CAM improves interpretability; supports efficient clinical CE analysis |
Jazi et al[47], 2023 | Not applicable (survey + clinical scenarios) | ChatGPT 4 (LLM by OpenAI) | Assess alignment of ChatGPT 4 with expert opinions on bariatric surgery suitability and recommendations | 10 patient scenarios; 30 international bariatric surgeons | Expert match: 30%; ChatGPT 4 inconsistency: 40%; recommended surgery in 60% vs experts 90% | ChatGPT 4 showed limited alignment and inconsistency; suitable for education, but not yet reliable for clinical decision making | |
Meinikheim et al[48], 2024 | BERN | Upper GI endoscopy (video based) | DeepLabV3+ with ResNet50 backbone (clinical decision support system) | Evaluate add on effect of AI on endoscopist performance in BERN detection | 96 videos from 72 patients; 51273 images (train); 22 endoscopists from 12 centers | AI alone: Sensitivity 92.2%, specificity 68.9%, accuracy 81.3%; nonexperts with AI: Sensitivity up from 69.8% to 78.0%, specificity up from 67.3% to 72.7% | AI significantly improved nonexperts’ diagnostic performance and confidence; comparable accuracy to experts; highlights human AI interaction dynamics |
Ahmad et al[49], 2021 | Colonoscopy | Identify top research priorities for AI implementation in colonoscopy | 15 international experts from 9 countries; 3 Delphi rounds | Not performance focused; methodology scores used for consensus | Provides a structured framework to guide future AI implementation research in colonoscopy; emphasizes clinical trial design, data annotation, integration, and regulation | ||
Lazaridis et al[50], 2021 | CE | Assess adherence to ESGE guidelines and future perspectives on CE use | 217 respondents from 47 countries via ESGE survey | Not model based; survey: 91% performed CE with appropriate indication; 84.1% classified findings as relevant/irrelevant | Highlights variation in guideline adherence; AI identified as top development priority (56.2%); suggests need for standardization and formal CE training | ||
Tian et al[51], 2024 | EUS | CNN with attention module | Automatic identification of 14 standard BPS anatomical sites on EUS | 6230 training images (1812 patients), internal: 1569 images (47 patients), external: 85322 images (131 patients from 16 centers) | Sensitivity: 89.45%-99.92%, specificity: 93.35%-99.79%, accuracy (internal): 92.1%-100%, kappa: 0.84-0.98 | Outperforms beginners, comparable to experts; enables efficient, high quality anatomical identification in EUS; potential for training and standardization | |
He et al[52], 2020 | Upper GI endoscopy (EGD) | CNN models (DenseNet 121, ResNet 50, VGG, etc.) | Automated classification of 11 anatomical sites for quality control and reporting | 3704 images from 211 routine EGD cases (Tianjin Medical University Hospital, Tianjin, China) | (DenseNet 121): Accuracy approximately 91.11%, F1 scores up to 94.92% for specific sites | Supports automated quality assurance in EGD via accurate site classification; aids report generation and completeness verification |
- Citation: Tasci B, Dogan S, Tuncer T. Artificial intelligence in gastrointestinal surgery: A systematic review. World J Gastrointest Surg 2025; 17(8): 109463
- URL: https://www.wjgnet.com/1948-9366/full/v17/i8/109463.htm
- DOI: https://dx.doi.org/10.4240/wjgs.v17.i8.109463