Review
Copyright ©The Author(s) 2025.
World J Gastroenterol. Jun 28, 2025; 31(24): 108021
Published online Jun 28, 2025. doi: 10.3748/wjg.v31.i24.108021
Table 1 Recent advances in machine learning applications for gastrointestinal diseases (2022-2025)[21-51]
AI algorithm
Parameters employed/study design
Sample size/control group/ validation
Outcomes
Performance
Ref.
SVMMulti-center data + TCGA validationTotal n = 255 (training 212 + internal validation 43); external: 4 centers + TCGAOS/DFS risk stratification (low/moderate/high); high-risk stage II/III chemotherapy benefitAUC training 0.773 (OS)/0.751 (DFS); validation 0.852 (OS)/0.837 (DFS)Li et al[21]
SVM + APINet/TransFGTongue features (color/morphology/coating) + microbiome (16S rDNA); multicenter prospective studyCohort 1: GC = 328 vs NGC = 304; cohort 2: GC = 937 vs NGC = 1911 (10 centers); external: GC = 294 vs NGC = 521 (7 centers)Distinguish GC/early GC/precancerous lesions (e.g., AG); superior to 8 blood biomarkersTongue model AUC: 0.89 (initial); 0.88-0.92 (internal); 0.83-0.88 (external); microbiome AUC: 0.94 (genus)/0.95 (species)Yuan et al[22]
SVM/LR/kNN + feature selectionLiver/PBMC RNA-seq dataLiver = 67; PBMC = 137; external public dataset; controls: Healthy + AH/AC/MASLD/HCVPrecise differentiation of AH/AC/MASLD/HCV; minimal gene sets (33-75 genes)Liver accuracy: 90% (AH/AC vs healthy), 91% (4-class); External 82%; PBMC accuracy: 75% (4-class)Listopad et al[23]
SVM/LR/RFMultiphase CT radiomics (n = 851)Total n = 215 (training 150 + external 65)Multiphase CT prediction (plain scan alternative)Nomogram C-index 0.913 (95%CI: 0.878-0.956)Liu et al[24]
SVMRadiomics features extracted from CT images; integrated rad-score + clinicopathological characteristics693 GC patients (2 centers); training (n = 390), internal validation (n = 151), external validation (n = 152) cohortsRad-scores significantly associated with diffuse-type GC and SRCC (P < 0.001)Lauren nomogram: AUC = 0.895 (training), 0.841 (internal), 0.893 (external). SRCC nomogram: AUC = 0.905 (training), 0.845 (internal), 0.918 (external)Chen et al[25]
Counterfactual random forest + optimal policy treesImatinib duration inferred via counterfactual model; OPTs interpreted counterfactual predictionsInternal: 117 (MSKCC); external: 363 (polish) + 239 (spanish)OPTs recommended no imatinib for low-risk subgroups: Gastric GIST < 15.9 cm + mitotic count < 11.5/5 mm². Any site GIST < 5.4 cm + mitotic count < 11.5/5 mm²Sensitivity: 92.7% (internal), 95.4% (Spanish), 92.4% (Polish). Specificity: 33.9% (internal)Bertsimas et al[26]
Markov decision tree modelInput variables from systematic review/meta-analysis of RCTs comparing DS, EUS-GE, and GJ; prospective cohort study for EUS-GE15 studies in Markov model1-month survival: DS (81.2%), EUS-GE (80.4%) > GJ (75.5%). 6-month survival: GJ (25.2%), EUS-GE (23.8%) > DS (21.3%)EUS-GE and GJ outperformed DS for long-term palliation (6 months)Chue et al[27]
Decision trees, LASSO, kNN, random forestsPathomics features extracted from HE-stained WSIs; multicenter retrospective study584 gastric cancer patients (training: 325, internal validation: 113, external validation 1:73, external validation 2:73)Pathomics signature independently predicted progression-free survival (P < 0.001, HR = 0.34)Training: AUC = 0.985; Internal validation: AUC = 0.921Han et al[28]
Optimal classification treesInput variables: Tumour size, mitotic count, tumour siteInternal: 395 patients (MSKCC + Spanish consortium); external: 556 patients (polish registry)OCT significantly improved calibration compared to MSK nomogramHigher C-index for OCT (0.805 vs 0.788); slope = 1.041 (OCT) vs 0.681 (MSK); no significant calibration error for OCTBertsimas et al[29]
Gradient-boosting decision treeBaseline characteristics, endoscopic atrophyTotal: 1099 chronic gastritis patients, training: 879, test: 220Key predictors: Age, OLGIM/OLGA stage, endoscopic atrophy, history of other malignanciesHarrell’s c-index: 0.84 (test set). Stratified risk into 3 categories (P < 0.001)Arai et al[30]
GBMProspective cohort study (15-year follow-up); 70% training and 30% validation splitFINRISK 2002 cohort: 7115 individuals (103 incident liver disease, 41 alcoholic liver disease)Gut microbiome and conventional factors showed comparable predictive powerLiver disease: AUROC = 0.834 (microbiome + conventional) vs 0.768 (conventional); alcoholic liver disease: AUROC = 0.956 (microbiome + conventional) vs 0.875 (conventional)Liu et al[31]
XGBoost (pre/delta-radiomics) + SMOTEPre/post-treatment MRI radiomics (n = 105); multisequence MRI integrationLARC patients n = 84; validation: 5/10-fold CV + independent; no controlDelta-radiomics > pre-radiomicsPre-model: AUC 0.93 ± 0.06 (train)/0.79 (test); delta-model: AUC 0.96 ± 0.03 (train)/0.83 (test)Wang et al[32]
sPLS-DAMulti-site microbiome (saliva/esophagus/stomach); 16S rRNA analysisEoE: Saliva = 29, biopsy = 25; controls: Non-EoE = 20 (saliva)/5 (biopsy)Saliva model distinguishes EoE/non-EoE; esophageal microbiota detects disease activitySaliva: CE 24%, validation Acc 78.6% (sensitivity 80%/specificity 75%); esophagus: CE 8% (activity detection)Facchin et al[33]
sPLS-DA + LRGenome-wide 5hmC features (n = 64); protein biomarkersHealthy = 165; LC = 62; HCC = 135; longitudinal cohortHCC diagnosis/recurrence prediction; tumor burden monitoringWild-score AUC = 93.24% (HCC vs non-HCC); HCC score AUC = 92.75% (HCC vs LC)Cai et al[34]
LR + mixed-effects modelMulticohort clinical/serologic/genetic data; JAK-STAT/IL6 pathwayIBD patients = 12083 (4 cohorts); within-case designFemale/CD colonic location/surgery linked to EIMs; MHC/CPEB4 associations; therapeutic targets (TNF/JAK-STAT)MHC OR = 2.5 (P = 1.4E-15); CPEB4 OR = 1.5 (P = 2.7 × 10-8); serologic panel OR = 1.7 (P = 3.6× 10-19)Khrom et al[35]
LR + RF + kNN + SVM + NNRecursive feature elimination; single-center retrospectiveTotal n = 864 (IIIa + n = 457 vs low-risk n = 407); 3-fold imputation/CVNN outperforms others (Acc 68.8%); best in medical complications (AUC = 0.695)NN: Overall Acc 0.688/AUC = 0.672; medical AUC = 0.695; surgical AUC = 0.653; cologne score Acc 0.510Jung et al[36]
RF vs cv-Enet/glmboost/ensembleMulticenter preoperative features; elastic-net regularizationDevelopment = 3182 (39 centers); validation = 260; no controlRF optimal prediction; surgical decision supportRF AUC = 0.844 (0.841-0.848) (development); similar in validationPera et al[37]
LR + Cox regression modelsEndoscopic features (whitish/irregular) + Histology (marked IM); retrospective multicenterTotal n = 182 (malignant = 48); progression cohort = 98; ROC/KM validationMisdiagnosis predictors (single/large/IM); progression predictors (whitish/margin/multi-diagnosis)AUC 0.871 (sensitivity 68.7%/specificity 92.5%)Zou et al[38]
RF + Swin transformer tongue modelQuestionnaire features (n = 10) + tongue images; multicenterTotal n = 2229 (9 centers); validation AUC > 0.8Key factors: Age/TCM constitution/tongue features/diet/anxiety; dynamic nomogramRF Acc 85.65%; tongue model Acc 73.33% (validation)Yu et al[39]
LRTumor location/ulceration/biopsy features; H-L test/DCA validationTraining = 516; validation = 220 (7:3 split); no control4 fibrosis predictors; severe fibrosis prediction modelRaining AUC = 0.819; validation AUC = 0.812; DCA clinical benefitZeng et al[40]
Stepwise logistic regressionDemographics/history/Lab markers (AFP/AST/albumin); prospective multicenterTotal n = 1723; HCC events = 109; median follow-up 2.2 years; no controlKey factors: Male/cirrhosis duration/family history/age/obesity/AFP/ASTIncidence 24/100 person-years; multivariate OR 1.08-2.73 (P < 0.05)Reddy et al[41]
Multivariate logistic regressionRadiomic features (peritumoral enhancement/necrosis); transcriptomic sequencingDevelopment = 470; validation: Control = 145 + HAIC = 143; multicenterImaging subtypes guide HAIC benefit; immune pathway correlationTraining AUC = 0.83; control AUC = 0.84; HAIC AUC = 0.73Ma et al[42]
LRMultiphase CT radiomics (peritumoral); RNA sequencingTotal n = 773 (training 334 + internal 142 + external 141 + survival 121 + RNA35); 4 centersMVI prediction + survival stratification (early recurrence/OS); glucose metabolism genesHybrid model AUC: 0.86 (internal)/0.84 (external); survival P < 0.01Xia et al[43]
Multivariate logistic regressionLI-RADS visualization score (A/B/C); obesity class II-IIITotal n = 2053 (A = 1685, B = 262, C = 106); longitudinal = 1546; multicenterAlcohol/MASLD cirrhosis + obesity linked to limited visualization; 19.6% worsened/53.1% improvedBaseline limited rate 18%; obesity OR = 2.1 (P < 0.001)Schoenberger et al[44]
Regularized LR + GBMRCT secondary analysis; mailed outreach; prior screening behaviorTotal n = 1200 (training 960 + test 240); 3 screening rounds; no controlSurveillance adherence stratification; key variables: Prior screening/primary care contactAUROC 0.66-0.77 (increasing); 41%-47% completion rateSingal et al[45]
LASSO logistic regressionPre/intraoperative variables; multicenter internationalTotal n = 2192 (train 70% + valid 30%); 12 centersDual prediction (PHLF/CCI > 40); online risk calculatorsPHLF AUC = 0.80 (calib. slope = 0.95); CCI AUC = 0.76Wang et al[46]
LDpred2 PRS + QCancer-10 integrationGenetic/non-genetic factors; Cox proportional hazardsUnited Kingdom Biobank n = 434587; case-control/survival validationC-index improvement (M + 7.3%/F + 6.5%); high-risk group 3.47 × (M)/2.77 × (F)Integrated C-index: 0.730 (M)/0.687 (F); sensitivity/specificity: 47.8%/80.3% (M), 42.7%/80.1% (F)Briggs et al[47]
Multivariable logistic + Cox regressionMulticenter FS screening; long-term follow-up (median 17 years)Intervention = 40085 (13 centers)High-ADR group: Distal CRC HR = 0.34 (incidence)/0.22 (mortality); all-site CRC HR = 0.58/0.52High vs low-ADR: Distal CRC HR 0.34 vs 0.55 (incidence), 0.22 vs 0.54 (mortality)Cross et al[48]
RRR + elastic net modelsInflammatory markers (CRP/IL6/GDF15) + metabolic markers (BMI/waist/C-peptide); case-controlTotal n = 1368 (cases 684 +controls 684); NHS = 818F + HPFS = 550MSex-specific: Median OR = 1.34 (inflammation)/1.25 (metabolic); NS in F; 11 key metabolitesVariance explained: 24% (inflammation)/27% (metabolic)Bever et al[49]
RSF/GBM/Deep hitMultivariable analysis + clinical feature selection; time-dependent C-indexCRC patients = 2157; stratified 5-fold CV (5 repeats)Deep hit best discrimination; RSF best calibration; SHAP key factors (R0 resection/TNM)Deep hit C-index 0.789 (0.779-0.799); RSF brier 0.096 (0.094-0.099)Yang et al[50]
Multivariable logistic regressionCell search CTCs detection; prospective CTCs + retrospective HGP; excluded neoadjuvant/extrahepaticTotal n = 177 (dHGP = 34, 19%); multivariable validation; no external cohortCTC-negativity predicts dHGP (OR = 2.7); dHGP better survivalOR = 2.7 (1.1-6.8), P = 0.028Meyer et al[51]
Table 2 Emerging deep learning approaches in gastrointestinal disease management (2022-2025)[52-77]
AI algorithm
Parameters employed/study design
Sample size, control group, validation
Outcomes
Performance
Ref.
CNN14 EUS anatomical sites; multicenter validationTraining: 1812 patients/6230 images; internal: 47 patients/1569 images; external: 131 patients/85322 imagesOutperformed novices in 11 sites; high expert agreement (kappa 084-0.98)Internal Acc 92.1-100%; external sensitivity 89.45%-99.92%/specificity 93.35%-99.79%Tian et al[52]
NNLS deconvolution + GCNNMethylation atlas (TSMA) + genome-wide density; multi-modal strategy5 tumor types + WBC training; validation = 239 low-depth cfDNAMulti-modal improves TOO in low-depth cfDNAValidation Acc 69%Nguyen et al[53]
CNN + survival MLPCT + clinical multimodal data; 5-fold CVGC patients = 1061; vs 3 SOTA methods; no controlMultimodal > single-modality; optimal OS/PFS predictionOS C-index 0.849; PFS 0.783 (surpass SOTA)Hao et al[54]
CNNHE features for HER2 status; trastuzumab responseSurgical = 300; biopsy = 101; treated = 41; no controlHER2 amplification prediction; treatment response (CR + PR vs SD + PD)Surgical AUC 0.847 (amplification)/0.903 (2 +); biopsy 0.723; treatment 0.833Wu et al[55]
DCNNHE whole-slide imaging; fibrosis stage comparisonNon-HCC = 639; HCC = 46; paired training/unpaired validationDetect HCC risk in mild fibrosis; saliency maps reveal nuclear atypia/immune infiltrationTraining Acc 81.0% (AUC = 0.80); validation 82.3% (AUC = 0.84)Nakatsuka et al[56]
Faster R-CNN modelPreoperative CT/MRI analysis; multicenter retrospective cohort (2012-2020)Total n = 1141 (PCCCL = 62, CHCC = 1079); 4:1 split (train-val vs test); CHCC cases (n = 1079) as negative controlDifferential diagnosis of rare PCCCLAccuracy: 0.962 (95%CI: 0.931-0.992); AP: PCCCL 0.908, CHCC 0.907; Recall: 0.95Liu et al[57]
TransformerEnd-to-end biomarker prediction; multicenter validationTotal n > 13k (16 CRC cohorts); resection training/biopsy validationSolved biopsy MSI diagnosis; improved interpretabilityMSI detection: Sensitivity 0.99/NPV > 0.99Wagner et al[58]
CNN + SMOTE/SVMPathomics/radiomics/immune score (CD3 +/CD8 +)/clinical; digital pathologyLung metastasis = 103; internal validationPath/radio features vs immunoscore (neg); triple independent prognosisIntegrated model: OS = 0.860/DFS = 0.875; Calib/DCA validatedWang et al[59]
INSIGHT (CNN) + wise MSI (self-attention) two-stageTumor tile classification + ResNet pre-trained + attention pooling; multicenterChinese multicenter cohort; vs 5 DL methodsOutperforms SOTA in MSI prediction; high pathologist consistencyWise MSI AUC 0.954 (0.948-0.960)Chang et al[60]
CNN + RNNMulticenter blinded trial; real-time monitoring + second observerTotal n = 946 (adenomas = 989); multicenterCADe > human in adenoma detection (sensitivity 94.6% vs 96.0%); changed 2.3% follow-upADR + 1.1%/case; Non-neoplastic + 4.9%; time + 42.6% (6.6 minutes)Sinonquel et al[61]
ANNPathological image analysis; retrospective multicenterTraining = 496 (GDPH); external validation = 150 (SYSMH)Avoided 34.9% unnecessary surgeries; outperformed United States guidelinesTraining AUC = 0.979; validation AUC = 0.978Su et al[62]
Multitask transformerPreop MRI multiparametric features; 7-center retrospectiveTotal n = 725 (train 234 + internal 58); external = 212/111/110PA-TACE benefit in high-MVI/low-survival group (P < 0.001)RFS C-index: Training 0.763/validation 0.628-0.728Wang et al[63]
Multistage DL modelsLongitudinal MRI (pre/post-TA) + clinical variables; multicenter retrospectiveTotal n = 289 (train 254 + external 35); 3 hospitalsDL clinical improved ER prediction (AUC = 0.740); High/low-risk RFS P = 0.04DL clinical AUC: 0.740 vs 0.571/0.648/0.689Kong and Li[64]
CNNClinical data + MRI radiomics; 6 time-frame predictionEarly HCC = 120 (recurrence = 44); retrospective (2005-2018)Imaging model > clinical (AUC 0.76 vs 0.68, P = 0.03)Imaging model AUC 0.71-0.85; KM P < 0.05 (2-6 years)Iseke et all[65]
RSF/ANN/decision treeInflammatory markers + ALBI + AFP + tumor size + INR; single-center retrospectiveTotal n = 808 (train 2:1 split)ANN optimal (5 years AUC = 0.85); High-risk OS HR = 7.98 (5.85-10.93)Training AUC 0.85 (0.82-0.88); validation 0.82 (0.74-0.85); P < 0.0001Zhang et al[66]
DLDCE-MRI + clinical/radiologic features; retrospective multicenterTotal n = 355 (train 251 + internal 62 + external 42); 2 centersProliferative HCC prediction; fusion model improves recurrence stratificationDL + clinical + radiologic model AUC: Training 0.99/internal 0.87/external 0.80Qu et al[67]
DenseNet169 + MLPMultiphase 25D CT + clinical features + RNA-seq; multicenter retrospectiveTotal n = 620 (TCIA + 3 centers); internal + 2 external test setsStratified RFS/OS (P < 0.001); high score links WNT/MYC/KRAS activationDLER MLP 0.891 vs DLER 0.797 vs clinical model 0.752Guo et al[68]
scSE-CatBoostMulti-site endoscopic images; CNN + scSE feature extractionTotal n = 302 (An Nan Hospital); RUT validationReal-time Helicobacter pylori detection; NPV 100%Acc 0.90; sensitivity 1.00/specificity 0.81; AUC = 0.88Lin et al[69]
Transformer + MILHE WSIs; dual-task (subtype + TMB prediction)EC = 529/918; CRC = 594/1495; vs 7 SOTA methodsStrong subtype-TMB association (fisher P < 0.001); guides immunotherapyOutperformed SOTA in both tasksWang et al[70]
GAN + ViT distillationHE/HPS staining; multi-task prognosis (OS/TTR/TRG)Internal = 258 CLM; two public datasetsTRG dichotomization. Acc 86.9-90.3%; 3-class Acc 78.5-82.1%OS C-index 0.804 (± 0.014); TTR C-index 0.735 (± 0.016)Elforaici et al[71]
Transfer learningHE WSIs analysisSegmentation = 100 WSI; validation: 4 cohorts (3 internal +1 external) + 6-month series = 217Fine-tuning improved F1 0.797-0.949 (P < 0.00001); 100% visual overlay accuracyDetection model AUC 0.959-0.978 (P < 0.00001)Khan et al[72]
DBMIA-NetGIA + EIA modules; adaptive channel graph convolution5 public datasets (CVC-Clinic DB); vs SOTA methodsEnhanced generalization94.12% dice (vs PraNet + 4.22%); leading in 6 metricsZhang et al[73]
UC-former vision transformerMulticenter retrospective study; mayo endoscopic score predictionTotal n = 768 UC patients/15120 images; internal + 3 external validationsSurpassed senior endoscopists; strong multicenter stabilityInternal Acc 90.8%; external Acc 82.4%-85.0%Qi et al[74]
MISTSelf-supervised contrastive learning + dual-stream MILTotal n = 480/666 WSI (Drum Tower); external = 273 WSI (Nanjing First)Acc comparable to pathologists (0.784 vs 0.806)External Acc 0.784Cai et al[75]
ResTransUNetGlobal context (transformer) + local features (CNN); LiTS2017/3Dircadb/Chaos/Sliver07LiTS2017/3Dircadb/Chaos/Sliver07Solved small/discontinuous region segmentation; outperformed SOTALiTS2017 dice 09535/VOE 0.0804/RVD -0.0007Ou et al[76]
GCNPathological micronecrosis analysis + multicenter datasets; GCN feature fusionTotal n = 752/3622 slides; internal (FAH-ZJUMS) + external (TCGA-LIHC)Improved prognostic stratification; precise necrosis localizationInternal + 8.18%; External + 9.02%; superior C-index vs baselineDeng et al[77]