Artificial intelligence in gastrointestinal surgery: A systematic review

doi:10.4240/wjgs.v17.i8.109463

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 17, Issue 8

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (3755)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-3) series, Tables (1-6) series.

Item

Count

PDF

HTML

2372

Figures (1-3)

Tables (1-6)

Sum=2558

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

296

Download

477

Sum=773

Publishing Process of This Article

Item

Count

Browse

155

Download

232

Sum=387

Aug 27, 2025 (publication date) through Sep 3, 2025

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Gastrointestinal Surgery

ISSN

1948-9366

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Systematic Reviews

World J Gastrointest Surg. Aug 27, 2025; 17(8): 109463
Published online Aug 27, 2025. doi: 10.4240/wjgs.v17.i8.109463

Table 5 Artificial intelligence applications in endoscopic diagnosis and quality control for gastrointestinal cancers

Ref.	Cancer type/lesion	Endoscopic modality	AI method/model	Task/objective	Dataset size/source	Performance metrics	Clinical relevance/impact
Messmann et al[35], 2022	BERN	Upper GI endoscopy	AI assisted deep learning systems	Real time detection and localization of Barrett’s neoplasia	Meta analyzer and real time work (n > 1000 images)	Sensitivity: 83.7%-95.4%, accuracy: 88%-96%	Improves detection of subtle lesions; supports targeted biopsies over Seattle protocol
Choi et al[36], 2022		EGD	CNN (squeeze and excitation network)	Classification of anatomical landmarks and completeness of photo documentation	2599 images from 250 EGD procedures (Korea University Hospital)	Landmark classification: Accuracy 97.58%, sensitivity 97.42%, specificity 99.66%; completeness detection: Accuracy 89.20%, specificity 100%	Enhances quality control in EGD by verifying complete anatomical documentation automatically
Inaba et al[37], 2024		Colonoscopy (preparation phase)	MobileNetV3 based CNN (smartphone app)	AI based stool image classification to assess bowel preparation quality	1689 images from 121 patients; 106 patient prospective validation	Accuracy: 90.2% (grade 1), 65.0% (grade 2), 89.3% (grade 3); BBPS ≥ 6 in 99.0% of app users	Improved bowel prep monitoring; 100% cecal intubation; reduced burden on patients and nurses
Zhang et al[14], 2023	Suspected choledocholithiasis	Not applicable (pre-endoscopy prediction)	ModelArts AI platform (Huawei); 7 machine learning models also tested	Predictive classification of CBD stones before cholecystectomy	1199 patients with symptomatic gallstones; retrospective, single center	ModelArts AI: Accuracy 0.97, recall 097, precision 0.971, F1 score 0.97	May outperform guideline based risk stratification; reduces unnecessary ERCP
Wu et al[38], 2021	EGC	EGD	ENDOANGEL system (CNNs + deep reinforcement learning)	Real time monitoring of blind spots and detection of EGC	1050 patients in multicenter RCT; 196 gastric lesions biopsied	Accuracy: 84.7%, sensitivity: 100%, specificity: 84.3%	Reduced blind spots, improved EGD quality, potential for real time EGC detection in clinical setting
Rondonotti et al[39], 2023	DRSPs ≤ 5 mm	Colonoscopy with blue light imaging	CAD EYE (Fujifilm, Tokyo, Japan), CNN based real time system	Optical diagnosis to support “resect and discard” strategy	596 DRSPs in 389 patients, 4 center prospective study (Italy)	NPV: 91.0%, sensitivity: 88.6%, specificity: 88.1%, accuracy: 88.4%	Meets ASGE PIVI thresholds; may enable safe omission of histology in DRSPs, especially beneficial for nonexperts
Koh et al[40], 2023	Colonic adenomas including SSA	Colonoscopy	GI Genius^™ (CADe system, Medtronic, MN, United States)	Real time detection of colonic polyps and ADR improvement	298 colonoscopies; 487 AI “hits”; 250 polyps removed	Post AI ADR: 30.4% vs baseline 243% (P = 0.02); SSA rate: 5.6%	Enhanced ADR even in experienced endoscopists; improved SSA detection; supports AI use in routine colonoscopy
Yuan et al[41], 2022	Gastric lesions (EGC, AGC, SMT, polyp, PU, erosion)	White light endoscopy	YOLO based DCNN model	Multiclass diagnosis of six gastric lesions + lesion free mucosa	31388 images (29809 train/1579 test) from 9443 patients	Overall accuracy: 85.7%; EGC: Sensitivity 59.2%, specificity 99.3%; AGC: Sensitivity 100%, specificity 98.1%	Comparable to senior endoscopists; improved diagnostic accuracy and efficiency; potential for real time support in diverse gastric lesion detection
Munir et al[42], 2024		Not applicable (survey based assessment)	ChatGPT	Evaluation of AI responses to perioperative GI surgery questions	1080 responses assessed by 45 surgeons	Majority graded “fair” or “good” (57.6%); highest “very good/excellent” rate for cholecystectomy (45.3%)	ChatGPT may aid in patient education, but only 20% deemed it accurate; limited utility in reducing message load
Sudarevic et al[43], 2023	Colorectal polyps	Colonoscopy	Poseidon system (EndoMind + waterjet based AI)	AI based in situ measurement of polyp size using waterjet as reference	28 polyps in silicone model + 29 polyps in routine colonoscopies	Median error: Poseidon 7.4% (model), 7.7% (clinical); visual: 25.1%/22.1%; forceps: 20.0%	Significantly improved sizing accuracy; does not require additional tools; useful for clinical polyp surveillance and resection decisions
Tsuboi et al[44], 2020	Small bowel angioectasia	Capsule endoscopy (PillCam SB2/SB3, Medtronic, MN, United States)	CNN (single shot multibox detector)	Automatic detection of angioectasia in CE images	2237 training images, 10488 validation images (488 angioectasia, 10000 normal)	AUC: 0.998; sensitivity: 98.8%, specificity: 98.4%, PPV: 75.4%, NPV: 99.9%	Enables high accuracy detection of angioectasia; may reduce oversight and physician workload during capsule reading
Chang et al[45], 2022		Upper GI endoscopy (EGD)	ResNeSt deep learning model	Evaluate photodocumentation completeness via anatomical classification	15305 training images; 15723 test images from 472 EGD cases	Accuracy: 96.64% (deep learning model), Photodocumentation rate: 78% (esophagus duodenum), 53.8% (pharynx duodenum)	Enables automated auditing of image completeness; higher completeness linked to higher ADR; applicable for routine EGD quality control
Hwang et al[46], 2021	Small bowel hemorrhagic and ulcerative lesions	CE	VGGNet based CNN + Grad CAM	Classification and localization of hemorrhagic vs ulcerative lesions	30224 abnormal + 30224 normal images (train); 5760 images (validation)	Combined model: Accuracy 96.83%, sensitivity 97.61%, specificity 96.04%, AUROC approximately 0.996	Enhanced lesion localization without manual annotation; Grad CAM improves interpretability; supports efficient clinical CE analysis
Jazi et al[47], 2023		Not applicable (survey + clinical scenarios)	ChatGPT 4 (LLM by OpenAI)	Assess alignment of ChatGPT 4 with expert opinions on bariatric surgery suitability and recommendations	10 patient scenarios; 30 international bariatric surgeons	Expert match: 30%; ChatGPT 4 inconsistency: 40%; recommended surgery in 60% vs experts 90%	ChatGPT 4 showed limited alignment and inconsistency; suitable for education, but not yet reliable for clinical decision making
Meinikheim et al[48], 2024	BERN	Upper GI endoscopy (video based)	DeepLabV3+ with ResNet50 backbone (clinical decision support system)	Evaluate add on effect of AI on endoscopist performance in BERN detection	96 videos from 72 patients; 51273 images (train); 22 endoscopists from 12 centers	AI alone: Sensitivity 92.2%, specificity 68.9%, accuracy 81.3%; nonexperts with AI: Sensitivity up from 69.8% to 78.0%, specificity up from 67.3% to 72.7%	AI significantly improved nonexperts’ diagnostic performance and confidence; comparable accuracy to experts; highlights human AI interaction dynamics
Ahmad et al[49], 2021		Colonoscopy		Identify top research priorities for AI implementation in colonoscopy	15 international experts from 9 countries; 3 Delphi rounds	Not performance focused; methodology scores used for consensus	Provides a structured framework to guide future AI implementation research in colonoscopy; emphasizes clinical trial design, data annotation, integration, and regulation
Lazaridis et al[50], 2021		CE		Assess adherence to ESGE guidelines and future perspectives on CE use	217 respondents from 47 countries via ESGE survey	Not model based; survey: 91% performed CE with appropriate indication; 84.1% classified findings as relevant/irrelevant	Highlights variation in guideline adherence; AI identified as top development priority (56.2%); suggests need for standardization and formal CE training
Tian et al[51], 2024		EUS	CNN with attention module	Automatic identification of 14 standard BPS anatomical sites on EUS	6230 training images (1812 patients), internal: 1569 images (47 patients), external: 85322 images (131 patients from 16 centers)	Sensitivity: 89.45%-99.92%, specificity: 93.35%-99.79%, accuracy (internal): 92.1%-100%, kappa: 0.84-0.98	Outperforms beginners, comparable to experts; enables efficient, high quality anatomical identification in EUS; potential for training and standardization
He et al[52], 2020		Upper GI endoscopy (EGD)	CNN models (DenseNet 121, ResNet 50, VGG, etc.)	Automated classification of 11 anatomical sites for quality control and reporting	3704 images from 211 routine EGD cases (Tianjin Medical University Hospital, Tianjin, China)	(DenseNet 121): Accuracy approximately 91.11%, F1 scores up to 94.92% for specific sites	Supports automated quality assurance in EGD via accurate site classification; aids report generation and completeness verification

Full Size Table

AI: Artificial intelligence; BERN: Barrett’s esophagus related neoplasia; GI: Gastrointestinal; EGD: Esophagogastroduodenoscopy; CNN: Convolutional neural network; BBPS: Boston bowel preparation scale; CBD: Common bile duct; ERCP: Endoscopic retrograde cholangiopancreatography; EGC: Early gastric cancer; RCT: Randomized controlled trial; DRSPs: Diminutive rectosigmoid polyps; NPV: Negative predictive value; ASGE: American Society for Gastrointestinal Endoscopy; PIVI: Preservation and Incorporation of Valuable endoscopic Innovations; SSA: Sessile serrated adenomas; CADe: Computer-aided detection; ADR: Adverse drug reaction; AGC: Advanced gastric cancer; SMT: Submucosal tumor; PU: Peptic ulcer; DCNN: Deep convolutional neural network; CE: Capsule endoscopy; AUC: Area under the curve; PPV: Positive predictive value; NPV: Negative predictive value; Grad-CAM: Gradient-weighted class activation mapping; AUROC: Area under the receiver operating characteristic curve; LLM: Large language model; ESGE: European Society of Gastrointestinal Endoscopy; EUS: Endoscopic ultrasonography; BPS: Biliopancreatic system.

Citation: Tasci B, Dogan S, Tuncer T. Artificial intelligence in gastrointestinal surgery: A systematic review. World J Gastrointest Surg 2025; 17(8): 109463
URL: https://www.wjgnet.com/1948-9366/full/v17/i8/109463.htm
DOI: https://dx.doi.org/10.4240/wjgs.v17.i8.109463