Review
Copyright ©The Author(s) 2025.
World J Gastroenterol. Jun 28, 2025; 31(24): 108021
Published online Jun 28, 2025. doi: 10.3748/wjg.v31.i24.108021
Table 1 Recent advances in machine learning applications for gastrointestinal diseases (2022-2025)[21-51]
AI algorithm
Parameters employed/study design
Sample size/control group/ validation
Outcomes
Performance
Ref.
SVMMulti-center data + TCGA validationTotal n = 255 (training 212 + internal validation 43); external: 4 centers + TCGAOS/DFS risk stratification (low/moderate/high); high-risk stage II/III chemotherapy benefitAUC training 0.773 (OS)/0.751 (DFS); validation 0.852 (OS)/0.837 (DFS)Li et al[21]
SVM + APINet/TransFGTongue features (color/morphology/coating) + microbiome (16S rDNA); multicenter prospective studyCohort 1: GC = 328 vs NGC = 304; cohort 2: GC = 937 vs NGC = 1911 (10 centers); external: GC = 294 vs NGC = 521 (7 centers)Distinguish GC/early GC/precancerous lesions (e.g., AG); superior to 8 blood biomarkersTongue model AUC: 0.89 (initial); 0.88-0.92 (internal); 0.83-0.88 (external); microbiome AUC: 0.94 (genus)/0.95 (species)Yuan et al[22]
SVM/LR/kNN + feature selectionLiver/PBMC RNA-seq dataLiver = 67; PBMC = 137; external public dataset; controls: Healthy + AH/AC/MASLD/HCVPrecise differentiation of AH/AC/MASLD/HCV; minimal gene sets (33-75 genes)Liver accuracy: 90% (AH/AC vs healthy), 91% (4-class); External 82%; PBMC accuracy: 75% (4-class)Listopad et al[23]
SVM/LR/RFMultiphase CT radiomics (n = 851)Total n = 215 (training 150 + external 65)Multiphase CT prediction (plain scan alternative)Nomogram C-index 0.913 (95%CI: 0.878-0.956)Liu et al[24]
SVMRadiomics features extracted from CT images; integrated rad-score + clinicopathological characteristics693 GC patients (2 centers); training (n = 390), internal validation (n = 151), external validation (n = 152) cohortsRad-scores significantly associated with diffuse-type GC and SRCC (P < 0.001)Lauren nomogram: AUC = 0.895 (training), 0.841 (internal), 0.893 (external). SRCC nomogram: AUC = 0.905 (training), 0.845 (internal), 0.918 (external)Chen et al[25]
Counterfactual random forest + optimal policy treesImatinib duration inferred via counterfactual model; OPTs interpreted counterfactual predictionsInternal: 117 (MSKCC); external: 363 (polish) + 239 (spanish)OPTs recommended no imatinib for low-risk subgroups: Gastric GIST < 15.9 cm + mitotic count < 11.5/5 mm². Any site GIST < 5.4 cm + mitotic count < 11.5/5 mm²Sensitivity: 92.7% (internal), 95.4% (Spanish), 92.4% (Polish). Specificity: 33.9% (internal)Bertsimas et al[26]
Markov decision tree modelInput variables from systematic review/meta-analysis of RCTs comparing DS, EUS-GE, and GJ; prospective cohort study for EUS-GE15 studies in Markov model1-month survival: DS (81.2%), EUS-GE (80.4%) > GJ (75.5%). 6-month survival: GJ (25.2%), EUS-GE (23.8%) > DS (21.3%)EUS-GE and GJ outperformed DS for long-term palliation (6 months)Chue et al[27]
Decision trees, LASSO, kNN, random forestsPathomics features extracted from HE-stained WSIs; multicenter retrospective study584 gastric cancer patients (training: 325, internal validation: 113, external validation 1:73, external validation 2:73)Pathomics signature independently predicted progression-free survival (P < 0.001, HR = 0.34)Training: AUC = 0.985; Internal validation: AUC = 0.921Han et al[28]
Optimal classification treesInput variables: Tumour size, mitotic count, tumour siteInternal: 395 patients (MSKCC + Spanish consortium); external: 556 patients (polish registry)OCT significantly improved calibration compared to MSK nomogramHigher C-index for OCT (0.805 vs 0.788); slope = 1.041 (OCT) vs 0.681 (MSK); no significant calibration error for OCTBertsimas et al[29]
Gradient-boosting decision treeBaseline characteristics, endoscopic atrophyTotal: 1099 chronic gastritis patients, training: 879, test: 220Key predictors: Age, OLGIM/OLGA stage, endoscopic atrophy, history of other malignanciesHarrell’s c-index: 0.84 (test set). Stratified risk into 3 categories (P < 0.001)Arai et al[30]
GBMProspective cohort study (15-year follow-up); 70% training and 30% validation splitFINRISK 2002 cohort: 7115 individuals (103 incident liver disease, 41 alcoholic liver disease)Gut microbiome and conventional factors showed comparable predictive powerLiver disease: AUROC = 0.834 (microbiome + conventional) vs 0.768 (conventional); alcoholic liver disease: AUROC = 0.956 (microbiome + conventional) vs 0.875 (conventional)Liu et al[31]
XGBoost (pre/delta-radiomics) + SMOTEPre/post-treatment MRI radiomics (n = 105); multisequence MRI integrationLARC patients n = 84; validation: 5/10-fold CV + independent; no controlDelta-radiomics > pre-radiomicsPre-model: AUC 0.93 ± 0.06 (train)/0.79 (test); delta-model: AUC 0.96 ± 0.03 (train)/0.83 (test)Wang et al[32]
sPLS-DAMulti-site microbiome (saliva/esophagus/stomach); 16S rRNA analysisEoE: Saliva = 29, biopsy = 25; controls: Non-EoE = 20 (saliva)/5 (biopsy)Saliva model distinguishes EoE/non-EoE; esophageal microbiota detects disease activitySaliva: CE 24%, validation Acc 78.6% (sensitivity 80%/specificity 75%); esophagus: CE 8% (activity detection)Facchin et al[33]
sPLS-DA + LRGenome-wide 5hmC features (n = 64); protein biomarkersHealthy = 165; LC = 62; HCC = 135; longitudinal cohortHCC diagnosis/recurrence prediction; tumor burden monitoringWild-score AUC = 93.24% (HCC vs non-HCC); HCC score AUC = 92.75% (HCC vs LC)Cai et al[34]
LR + mixed-effects modelMulticohort clinical/serologic/genetic data; JAK-STAT/IL6 pathwayIBD patients = 12083 (4 cohorts); within-case designFemale/CD colonic location/surgery linked to EIMs; MHC/CPEB4 associations; therapeutic targets (TNF/JAK-STAT)MHC OR = 2.5 (P = 1.4E-15); CPEB4 OR = 1.5 (P = 2.7 × 10-8); serologic panel OR = 1.7 (P = 3.6× 10-19)Khrom et al[35]
LR + RF + kNN + SVM + NNRecursive feature elimination; single-center retrospectiveTotal n = 864 (IIIa + n = 457 vs low-risk n = 407); 3-fold imputation/CVNN outperforms others (Acc 68.8%); best in medical complications (AUC = 0.695)NN: Overall Acc 0.688/AUC = 0.672; medical AUC = 0.695; surgical AUC = 0.653; cologne score Acc 0.510Jung et al[36]
RF vs cv-Enet/glmboost/ensembleMulticenter preoperative features; elastic-net regularizationDevelopment = 3182 (39 centers); validation = 260; no controlRF optimal prediction; surgical decision supportRF AUC = 0.844 (0.841-0.848) (development); similar in validationPera et al[37]
LR + Cox regression modelsEndoscopic features (whitish/irregular) + Histology (marked IM); retrospective multicenterTotal n = 182 (malignant = 48); progression cohort = 98; ROC/KM validationMisdiagnosis predictors (single/large/IM); progression predictors (whitish/margin/multi-diagnosis)AUC 0.871 (sensitivity 68.7%/specificity 92.5%)Zou et al[38]
RF + Swin transformer tongue modelQuestionnaire features (n = 10) + tongue images; multicenterTotal n = 2229 (9 centers); validation AUC > 0.8Key factors: Age/TCM constitution/tongue features/diet/anxiety; dynamic nomogramRF Acc 85.65%; tongue model Acc 73.33% (validation)Yu et al[39]
LRTumor location/ulceration/biopsy features; H-L test/DCA validationTraining = 516; validation = 220 (7:3 split); no control4 fibrosis predictors; severe fibrosis prediction modelRaining AUC = 0.819; validation AUC = 0.812; DCA clinical benefitZeng et al[40]
Stepwise logistic regressionDemographics/history/Lab markers (AFP/AST/albumin); prospective multicenterTotal n = 1723; HCC events = 109; median follow-up 2.2 years; no controlKey factors: Male/cirrhosis duration/family history/age/obesity/AFP/ASTIncidence 24/100 person-years; multivariate OR 1.08-2.73 (P < 0.05)Reddy et al[41]
Multivariate logistic regressionRadiomic features (peritumoral enhancement/necrosis); transcriptomic sequencingDevelopment = 470; validation: Control = 145 + HAIC = 143; multicenterImaging subtypes guide HAIC benefit; immune pathway correlationTraining AUC = 0.83; control AUC = 0.84; HAIC AUC = 0.73Ma et al[42]
LRMultiphase CT radiomics (peritumoral); RNA sequencingTotal n = 773 (training 334 + internal 142 + external 141 + survival 121 + RNA35); 4 centersMVI prediction + survival stratification (early recurrence/OS); glucose metabolism genesHybrid model AUC: 0.86 (internal)/0.84 (external); survival P < 0.01Xia et al[43]
Multivariate logistic regressionLI-RADS visualization score (A/B/C); obesity class II-IIITotal n = 2053 (A = 1685, B = 262, C = 106); longitudinal = 1546; multicenterAlcohol/MASLD cirrhosis + obesity linked to limited visualization; 19.6% worsened/53.1% improvedBaseline limited rate 18%; obesity OR = 2.1 (P < 0.001)Schoenberger et al[44]
Regularized LR + GBMRCT secondary analysis; mailed outreach; prior screening behaviorTotal n = 1200 (training 960 + test 240); 3 screening rounds; no controlSurveillance adherence stratification; key variables: Prior screening/primary care contactAUROC 0.66-0.77 (increasing); 41%-47% completion rateSingal et al[45]
LASSO logistic regressionPre/intraoperative variables; multicenter internationalTotal n = 2192 (train 70% + valid 30%); 12 centersDual prediction (PHLF/CCI > 40); online risk calculatorsPHLF AUC = 0.80 (calib. slope = 0.95); CCI AUC = 0.76Wang et al[46]
LDpred2 PRS + QCancer-10 integrationGenetic/non-genetic factors; Cox proportional hazardsUnited Kingdom Biobank n = 434587; case-control/survival validationC-index improvement (M + 7.3%/F + 6.5%); high-risk group 3.47 × (M)/2.77 × (F)Integrated C-index: 0.730 (M)/0.687 (F); sensitivity/specificity: 47.8%/80.3% (M), 42.7%/80.1% (F)Briggs et al[47]
Multivariable logistic + Cox regressionMulticenter FS screening; long-term follow-up (median 17 years)Intervention = 40085 (13 centers)High-ADR group: Distal CRC HR = 0.34 (incidence)/0.22 (mortality); all-site CRC HR = 0.58/0.52High vs low-ADR: Distal CRC HR 0.34 vs 0.55 (incidence), 0.22 vs 0.54 (mortality)Cross et al[48]
RRR + elastic net modelsInflammatory markers (CRP/IL6/GDF15) + metabolic markers (BMI/waist/C-peptide); case-controlTotal n = 1368 (cases 684 +controls 684); NHS = 818F + HPFS = 550MSex-specific: Median OR = 1.34 (inflammation)/1.25 (metabolic); NS in F; 11 key metabolitesVariance explained: 24% (inflammation)/27% (metabolic)Bever et al[49]
RSF/GBM/Deep hitMultivariable analysis + clinical feature selection; time-dependent C-indexCRC patients = 2157; stratified 5-fold CV (5 repeats)Deep hit best discrimination; RSF best calibration; SHAP key factors (R0 resection/TNM)Deep hit C-index 0.789 (0.779-0.799); RSF brier 0.096 (0.094-0.099)Yang et al[50]
Multivariable logistic regressionCell search CTCs detection; prospective CTCs + retrospective HGP; excluded neoadjuvant/extrahepaticTotal n = 177 (dHGP = 34, 19%); multivariable validation; no external cohortCTC-negativity predicts dHGP (OR = 2.7); dHGP better survivalOR = 2.7 (1.1-6.8), P = 0.028Meyer et al[51]