Systematic Reviews
Copyright ©The Author(s) 2025.
World J Gastrointest Surg. Aug 27, 2025; 17(8): 109463
Published online Aug 27, 2025. doi: 10.4240/wjgs.v17.i8.109463
Table 5 Artificial intelligence applications in endoscopic diagnosis and quality control for gastrointestinal cancers
Ref.
Cancer type/lesion
Endoscopic modality
AI method/model
Task/objective
Dataset size/source
Performance metrics
Clinical relevance/impact
Messmann et al[35], 2022BERNUpper GI endoscopyAI assisted deep learning systemsReal time detection and localization of Barrett’s neoplasiaMeta analyzer and real time work (n > 1000 images)Sensitivity: 83.7%-95.4%, accuracy: 88%-96%Improves detection of subtle lesions; supports targeted biopsies over Seattle protocol
Choi et al[36], 2022EGDCNN (squeeze and excitation network)Classification of anatomical landmarks and completeness of photo documentation2599 images from 250 EGD procedures (Korea University Hospital)Landmark classification: Accuracy 97.58%, sensitivity 97.42%, specificity 99.66%; completeness detection: Accuracy 89.20%, specificity 100%Enhances quality control in EGD by verifying complete anatomical documentation automatically
Inaba et al[37], 2024Colonoscopy (preparation phase)MobileNetV3 based CNN (smartphone app)AI based stool image classification to assess bowel preparation quality1689 images from 121 patients; 106 patient prospective validationAccuracy: 90.2% (grade 1), 65.0% (grade 2), 89.3% (grade 3); BBPS ≥ 6 in 99.0% of app usersImproved bowel prep monitoring; 100% cecal intubation; reduced burden on patients and nurses
Zhang et al[14], 2023Suspected choledocholithiasisNot applicable (pre-endoscopy prediction)ModelArts AI platform (Huawei); 7 machine learning models also testedPredictive classification of CBD stones before cholecystectomy1199 patients with symptomatic gallstones; retrospective, single centerModelArts AI: Accuracy 0.97, recall 097, precision 0.971, F1 score 0.97May outperform guideline based risk stratification; reduces unnecessary ERCP
Wu et al[38], 2021EGCEGDENDOANGEL system (CNNs + deep reinforcement learning)Real time monitoring of blind spots and detection of EGC1050 patients in multicenter RCT; 196 gastric lesions biopsiedAccuracy: 84.7%, sensitivity: 100%, specificity: 84.3%Reduced blind spots, improved EGD quality, potential for real time EGC detection in clinical setting
Rondonotti et al[39], 2023DRSPs ≤ 5 mmColonoscopy with blue light imagingCAD EYE (Fujifilm, Tokyo, Japan), CNN based real time systemOptical diagnosis to support “resect and discard” strategy596 DRSPs in 389 patients, 4 center prospective study (Italy)NPV: 91.0%, sensitivity: 88.6%, specificity: 88.1%, accuracy: 88.4%Meets ASGE PIVI thresholds; may enable safe omission of histology in DRSPs, especially beneficial for nonexperts
Koh et al[40], 2023Colonic adenomas including SSAColonoscopyGI Genius (CADe system, Medtronic, MN, United States)Real time detection of colonic polyps and ADR improvement298 colonoscopies; 487 AI “hits”; 250 polyps removedPost AI ADR: 30.4% vs baseline 243% (P = 0.02); SSA rate: 5.6%Enhanced ADR even in experienced endoscopists; improved SSA detection; supports AI use in routine colonoscopy
Yuan et al[41], 2022Gastric lesions (EGC, AGC, SMT, polyp, PU, erosion)White light endoscopyYOLO based DCNN modelMulticlass diagnosis of six gastric lesions + lesion free mucosa31388 images (29809 train/1579 test) from 9443 patientsOverall accuracy: 85.7%; EGC: Sensitivity 59.2%, specificity 99.3%; AGC: Sensitivity 100%, specificity 98.1%Comparable to senior endoscopists; improved diagnostic accuracy and efficiency; potential for real time support in diverse gastric lesion detection
Munir et al[42], 2024Not applicable (survey based assessment)ChatGPTEvaluation of AI responses to perioperative GI surgery questions1080 responses assessed by 45 surgeonsMajority graded “fair” or “good” (57.6%); highest “very good/excellent” rate for cholecystectomy (45.3%)ChatGPT may aid in patient education, but only 20% deemed it accurate; limited utility in reducing message load
Sudarevic et al[43], 2023Colorectal polypsColonoscopyPoseidon system (EndoMind + waterjet based AI)AI based in situ measurement of polyp size using waterjet as reference28 polyps in silicone model + 29 polyps in routine colonoscopiesMedian error: Poseidon 7.4% (model), 7.7% (clinical); visual: 25.1%/22.1%; forceps: 20.0%Significantly improved sizing accuracy; does not require additional tools; useful for clinical polyp surveillance and resection decisions
Tsuboi et al[44], 2020Small bowel angioectasiaCapsule endoscopy (PillCam SB2/SB3, Medtronic, MN, United States)CNN (single shot multibox detector)Automatic detection of angioectasia in CE images2237 training images, 10488 validation images (488 angioectasia, 10000 normal)AUC: 0.998; sensitivity: 98.8%, specificity: 98.4%, PPV: 75.4%, NPV: 99.9%Enables high accuracy detection of angioectasia; may reduce oversight and physician workload during capsule reading
Chang et al[45], 2022Upper GI endoscopy (EGD)ResNeSt deep learning modelEvaluate photodocumentation completeness via anatomical classification15305 training images; 15723 test images from 472 EGD casesAccuracy: 96.64% (deep learning model), Photodocumentation rate: 78% (esophagus duodenum), 53.8% (pharynx duodenum)Enables automated auditing of image completeness; higher completeness linked to higher ADR; applicable for routine EGD quality control
Hwang et al[46], 2021Small bowel hemorrhagic and ulcerative lesionsCEVGGNet based CNN + Grad CAMClassification and localization of hemorrhagic vs ulcerative lesions30224 abnormal + 30224 normal images (train); 5760 images (validation)Combined model: Accuracy 96.83%, sensitivity 97.61%, specificity 96.04%, AUROC approximately 0.996Enhanced lesion localization without manual annotation; Grad CAM improves interpretability; supports efficient clinical CE analysis
Jazi et al[47], 2023Not applicable (survey + clinical scenarios)ChatGPT 4 (LLM by OpenAI)Assess alignment of ChatGPT 4 with expert opinions on bariatric surgery suitability and recommendations10 patient scenarios; 30 international bariatric surgeonsExpert match: 30%; ChatGPT 4 inconsistency: 40%; recommended surgery in 60% vs experts 90%ChatGPT 4 showed limited alignment and inconsistency; suitable for education, but not yet reliable for clinical decision making
Meinikheim et al[48], 2024BERNUpper GI endoscopy (video based)DeepLabV3+ with ResNet50 backbone (clinical decision support system)Evaluate add on effect of AI on endoscopist performance in BERN detection96 videos from 72 patients; 51273 images (train); 22 endoscopists from 12 centersAI alone: Sensitivity 92.2%, specificity 68.9%, accuracy 81.3%; nonexperts with AI: Sensitivity up from 69.8% to 78.0%, specificity up from 67.3% to 72.7%AI significantly improved nonexperts’ diagnostic performance and confidence; comparable accuracy to experts; highlights human AI interaction dynamics
Ahmad et al[49], 2021ColonoscopyIdentify top research priorities for AI implementation in colonoscopy15 international experts from 9 countries; 3 Delphi roundsNot performance focused; methodology scores used for consensusProvides a structured framework to guide future AI implementation research in colonoscopy; emphasizes clinical trial design, data annotation, integration, and regulation
Lazaridis et al[50], 2021CEAssess adherence to ESGE guidelines and future perspectives on CE use217 respondents from 47 countries via ESGE surveyNot model based; survey: 91% performed CE with appropriate indication; 84.1% classified findings as relevant/irrelevantHighlights variation in guideline adherence; AI identified as top development priority (56.2%); suggests need for standardization and formal CE training
Tian et al[51], 2024EUSCNN with attention moduleAutomatic identification of 14 standard BPS anatomical sites on EUS6230 training images (1812 patients), internal: 1569 images (47 patients), external: 85322 images (131 patients from 16 centers)Sensitivity: 89.45%-99.92%, specificity: 93.35%-99.79%, accuracy (internal): 92.1%-100%, kappa: 0.84-0.98Outperforms beginners, comparable to experts; enables efficient, high quality anatomical identification in EUS; potential for training and standardization
He et al[52], 2020Upper GI endoscopy (EGD)CNN models (DenseNet 121, ResNet 50, VGG, etc.)Automated classification of 11 anatomical sites for quality control and reporting3704 images from 211 routine EGD cases (Tianjin Medical University Hospital, Tianjin, China)(DenseNet 121): Accuracy approximately 91.11%, F1 scores up to 94.92% for specific sitesSupports automated quality assurance in EGD via accurate site classification; aids report generation and completeness verification