Comprehensive review of Bayesian network applications in gastrointestinal cancers

doi:10.5306/wjco.v16.i6.104299

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 16, Issue 6

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (2187)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-2) series, Tables (1-6) series.

Item

Count

PDF

HTML

993

Figures (1-2)

139

Tables (1-6)

251

Sum=1468

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

282

Sum=381

Publishing Process of This Article

Item

Count

Browse

Download

195

Sum=238

Jun 24, 2025 (publication date) through Aug 31, 2025

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Clinical Oncology

ISSN

2218-4333

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Review

World J Clin Oncol. Jun 24, 2025; 16(6): 104299
Published online Jun 24, 2025. doi: 10.5306/wjco.v16.i6.104299

Table 1 Six Bayesian network parameter learning algorithms

Algorithms	For incomplete datasets	Basic principle	Advantages & disadvantages	Ref.
Maximum likelihood estimate	No	Estimates parameters by maximizing the likelihood function based on observed data	Fast convergence; no prior knowledge used, leading to slow convergence	[18]
Bayesian method	No	Uses a prior distribution (often Dirichlet) and updates it with observed data to obtain a posterior distribution	Incorporates prior knowledge; computationally intensive	[19]
Expectation-maximization	Yes	Estimates parameters by iteratively applying expectation (E) and maximization (M) steps to handle missing data	Effective with missing data; can converge to local optima	[20]
Robust bayesian estimate	Yes	Estimates parameters using probability intervals to represent the ranges of conditional probabilities without assumptions	Does not require assumptions about missing data; interval width indicates reliability of estimation	[12]
Monte-Carlo method	Yes	Uses random sampling (e.g., Gibbs sampling) to estimate the expectation of the joint probability distribution	Flexible and can handle complex models; computationally expensive and convergence can be slow	[21]

Table 2 Some methodologies of Bayesian network inference

Algorithm	Network type	Complexity	Accuracy	Advantages	Ref.
Variable elimination	Single, multi-connected networks	Exponential in the number of variables in factorization	Exact	Simple, easy to use	[22]
Junction tree	Single, multi-connected networks	Exponential in the size of the largest clique	Exact	Fastest method, suitable for sparse networks	[22]
Differential method	Single, multi-connected networks	Proportional to the complexity of differentiation operations	Exact	Can solve multiple problems simultaneously	[23]
Stochastic sampling	Single, multi-connected networks	Inversely proportional to the probability of evidence variables	Approximate	Simple, widely applicable, and generally effective	[24]
Loopy belief propagation	Single, multi-connected networks	Exponential in the number of loops in the network	Approximate	Performs well when the algorithm converges	[25]

Table 3 Some popular Bayesian network software tools

Tools	Language	Description	Links
Bnlearn[26]	R	Python package for causal discovery by learning the graphical structure of Bayesian networks	http://www.bnlearn.com/
BNT[27]	MATLAB	Bayes net toolbox for Matlab	https://github.com/bayesnet/bnt
GOBNILP	C	Learning Bayesian network structure with integer programming	https://www.cs.york.ac.uk/aig/sw/gobnilp/
Bnstruct	R	Bnstruct is an R package which learns Bayesian networks from data with missing values	https://cran.r-project.org/web/packages/bnstruct
Bmmalone	C++	This project implements a number of algorithms for learning Bayesian network structures using state space search techniques.	https://github.com/bmmalone/urlearning-cpp
Causal-Learner[28]	MATLAB	A toolbox for causal structure and Markov blanket learning	https://github.com/z-dragonl/Causal-Learner
CausalFS[29]	C/C++	An open-source package of causal feature selection and causal (Bayesian network) structure learning	https://github.com/kuiy/CausalFS
Weka[30]	Java	Weka contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functions	https://git.cms.waikato.ac.nz/weka/weka
Bene	C	An exact Bayesian network structure learning software based on dynamic programming	https://github.com/tomisilander/bene
Causal-learn	Python	Causal discovery in Python. It also includes (conditional) independence tests and score functions	https://github.com/py-why/causal-learn
pyCausalFS	Python	An open-source package of causal feature selection and causal (Bayesian network) structure learning	https://github.com/kuiy/pyCausalFS
CausalExplorer[31]	MATLAB	A MATLAB library of computational causal discovery and variable selection algorithms	https://github.com/mensxmachina/CausalExplorer
Pgmpy	Python	Python library for learning (structure and parameter), inference (probabilistic and causal), and simulations in Bayesian networks	https://github.com/pgmpy/pgmpy
Tetrad	Java	It provides algorithms the capability to discover causal models, search for models of latent structure	https://github.com/cmu-phil/tetrad
Causal discovery toolbox	Python	The causal discovery toolbox is a package for causal inference in graphs	https://github.com/FenTechSolutions/CausalDiscoveryToolbox
DoWhy[32]	Python	DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions	https://github.com/py-why/dowhy

Table 4 Summary of Bayesian network applications in gastric cancer research

Data type	Bayesian network algorithm	Key findings	Ref.
Patient data from the public SEER database; patient data from a hospital cohort	Naïve Bayesian model	The Bayesian network model identified key risk factors, including age, T-stage, N-stage, tumor size, grade, and tumor location, contributing to the prediction of distant metastasis in stage T1 gastric cancer	[37]
LncRNA expression profiles from 375 STAD samples in the TCGA database	Bayesian Lasso-logistic regression	The Bayesian-based approach identified seven lncRNAs, effectively stratified STAD patients by risk, and demonstrated robust prognostic prediction accuracy with AUC values above 0.69 for 1-, 3-, and 5-year survival	[37]
Survival and censorship data from 760 gastric cancer patients	A two-slice temporal Bayesian network model	The Bayesian network improved prediction accuracy, reduced bias, and aligned with classical methods while handling high-dimensional data effectively	[38]
Data from seven randomized clinical trials involving 2655 metastatic gastric cancer patients	Bayesian fixed-effect network meta-analysis model	The Bayesian analysis identified nivolumab as the optimal choice for OS in mGC patients without peritoneal metastases, providing the best balance of efficacy and safety	[39]
Gene expression data from TCGA gastric cancer and metastatic gastric cancer immunotherapy clinical trial datasets	Bayesian semi-nonnegative matrix trifactorization method	The Bayesian method identified clinically relevant pathways associated with molecular subtypes and immunotherapy response, enabling patient stratification and prognosis prediction in independent validation datasets	[40]
LncRNA-miRNA-disease association data, including known associations related to gastric cancer	A Naïve Bayesian Classifier was integrated into a CFNBC	CFNBC demonstrated reliable prediction performance (AUC of 0.8576) and successfully identified potential lncRNA-disease associations for gastric cancer in case studies	[42]
Clinical data from 339 gastric cancer patients	BNN	The BNN outperformed the ANN in predicting survival of gastric cancer patients, with higher sensitivity, specificity, prediction accuracy, and AUCs	[43]
Data from 245 gastric endoscopic submucosal dissections	Naïve Bayesian model	The Bayesian model demonstrated good discriminative power in predicting ESD outcomes, with naïve Bayesian models presenting AUCs of approximately 80% in the derivation cohort and at least 74% in cross-validation for both outcomes	[44]
Data from the structural domain characteristics of the p42.3 protein molecule	Bayesian network model	The study identified the most likely acting pathway for p42.3 in gastric cancer as "S100A11" - RAGE - P38 - MAPK - Microtubule-associated protein - spindle protein - centromere protein - cell proliferation" through Bayesian probability optimizing calculation, which was subsequently validated by biological experiments	[45]
Genome-wide gene expression profiles	Categorical Bayesian networks	The BN approach outperformed benchmark methods and successfully identified disease-specific changes in gene regulation that differentiate cancer types, improving prediction	[46]
Gene expression profile data from gastric cancer patients	A Bayesian Network was constructed using 18 genes selected by multiple logistic regression	The constructed Bayesian Network was very similar to the network from GeneMANIA, indicating the effectiveness of the Bayesian approach in modeling the relationships among genes associated with gastric cancer subtypes	[47]

SEER: Surveillance, Epidemiology and End Results; STAD: Stomach adenocarcinoma; AUC: Area under curve; OS: Overall survival; CFNBC: Collaborative filtering model; BNN: Bayesian neural network; ANN: Artificial neural network; ESD: Endoscopic submucosal dissection.

Table 5 Summary of Bayesian network applications in colorectal cancer research

Data type	Bayesian network algorithm	Key findings	Ref.
Observational data on CRC, including risk factors such as alcohol consumption, smoking, diabetes, and hypertension	Structure learning algorithms combined with expert knowledge to construct BN model	The BN model effectively segmented populations into risk subgroups and identified modifiable risk factors with significant predictive influence on CRC risk	[53]
Simulation models of CRC progression and natural history, including parameters for risk factors and disease progression	Bayesian calibration using Hamiltonian Monte Carlo-based algorithms integrated with ANN emulators	The Bayesian framework successfully calibrated CRC simulation models, accurately predicting outcomes within confidence intervals, and reduced computational complexity, enabling efficient uncertainty quantification and improved policy analysis for CRC	[54]
Genetic and expression data from 275 normal colon and 276 CRC samples in the SYSCOL cohort	Bayesian network model	BN revealed tumor-specific (transposable elements) TE-eQTLs that influence the expression of cancer driver genes, demonstrating TEs' role in activating oncogenic pathways and providing insights into tumor-specific regulatory mechanisms	[55]
Clinical data of 1253 CRC patients under 50 years of age from the Yonsei Cancer Center, encompassing 93 clinical features	Bayesian network-based synthesizing model	The BN-based model generated a synthetic population of 5005 individuals with no significant statistical differences from the original data. Training predictive models with synthetic data improved performance, especially for small datasets	[56]
Plasma concentrations of heavy metals (As, Cd, Cr, Hg, Pb) and tumor tissue NGS data from CRC patients	BKMR	BKMR analysis revealed that Pb, As, and Cd were significant contributors to increased mutation rates, particularly indels. Mutational signatures showed strong correlations with heavy metal exposure, and shifts in the mutational landscape were observed between high and low exposure groups	[57]
CRC-associated loci from genome-wide association studies (GWAS) and multi-omics datasets	iRIGS, a Bayesian approach	The iRIGS identified 105 high-confidence risk genes, including CEBPB, which promotes CRC cell proliferation through oncogenic pathways such as MAPK, PI3K-Akt, and Ras signaling	[58]
Epidemiological data related to gut microbiome and CRC risk	Multivariate Mendelian randomization analysis based on Bayesian model	Nine bacteria were identified with a robust causal relationship to CRC development, including Streptococcus thermophilus, Bacteroides ovatus, and others	[59]
Clinicopathologic, immune, microbial, and genomic variables from 815 stage II-III CRC patients	BART	The BART risk model identified seven stable survival predictors and successfully stratified patients into low, intermediate, and high-risk groups with statistically significant survival differences	[52]
CRC patients with poorly differentiated and moderately differentiated tumors, analyzed through fecal microbiota	RDP classifier Bayesian algorithm	The study identified distinct GM associated with poorly differentiated CRC, including high abundance of Bifidobacterium and other bacteria	[60]
Colon cancer (microsatellite stable/instable stage III) samples analyzed through multi-omics data (gene expression, DNA methylation, copy number variation)	IntOMICS, an integrative framework based on Bayesian networks	IntOMICS successfully integrated multi-omics data and biological prior knowledge to uncover regulatory networks, revealing deeper insights into genetic information flow and identifying potential predictive biomarkers for stage III colon cancer	[61]
Rectal cancer clinical data from 705 patients who underwent radical resection	Tree-augmented naïve Bayes algorithm	The BN model, incorporating factors like age, CEA, CA19-9, CA125, differentiation status, T stage, N stage, KRAS mutation, and postoperative chemotherapy, showed higher accuracy (AUC = 80.11%) in predicting 3-year OS compared to a nomogram (AUC = 74.23%)	[62]
Time series transcriptomic data from normal and tumor cells of colorectal tissue	DBNs	The DBN-based classifier achieved high classification accuracy, revealing significant differences in gene regulatory networks between normal and tumor cells in CRC, particularly in the neighborhoods of oncogenes and cancer tissue markers	[63]
Gene expression profiles of COAD tumor samples from TCGA and normal colon tissues from GTEx	Bayesian network model	The BN analysis identified 14 upregulated DEGs significantly correlated with tumor stages, and Cox regression highlighted tumor stage, STMN4, and FAM135B dysregulation as independent prognostic factors for COAD survival outcomes	[64]
Clinical data of colon cancer patients, including 18 prognostic biomarkers and three clinical features	Bayesian binary classifiers, including a Bayesian bimodal neural network and a single modal BNN classifier	The Bayesian bimodal neural network achieved the best results in terms of AUC (0.8083), macro F1-score (0.7300), and concordance index (0.7238), demonstrating superior robustness compared to non-Bayesian models and the Bayesian single modal classifier	[65]
Normal mucosa samples from 100 colon cancer patients and 50 healthy donors, including genetic variants, DNA methylation markers, and gene expression data	Bayesian network model	The BN analysis revealed that most combinations showed the canonical pathway where methylation markers cause gene expression variation (60.1%), with 33.9% showing non-causal relationships, and 6% indicating gene expression causes variation in methylation markers	[66]
Genetic data from 55105 CRC cases and 65079 controls, along with an independent cohort of 101987 individuals including 1699 CRC cases	LDpred, a Bayesian approach	The LDpred-derived polygenic risk score showed the highest discriminatory accuracy for CRC risk prediction, identifying 30% of individuals without a family history at similar risk to those with a family history, suggesting the potential for earlier screening	[67]
Fecal microbiome samples from 45 rectal cancer patients before preoperative CCRT	Bayesian network model	The BN analysis identified Duodenibacillus massiliensis as linked with an improved complete response rate after preoperative CCRT, suggesting its potential as a predictive biomarker	[68]
Gene expression data from primary colon cancer and CLM samples	Fast and FFBN	FFBN successfully constructed gene regulatory networks for colon cancer and colon to liver metastasis, revealing unique molecular mechanisms for CLM and shared similarities with primary liver and colon cancers	[69]
Gut microbiota data related to CRC	Bayesian networks combined with IDA (Intervention calculus when the DAG is absent)	Four species-Fusobacterium, Citrobacter, Microbacterium, and Slackia-were identified as having non-null lower bounds of causal effects on CRC, supporting the role of specific microbial communities in CRC progression	[70]
CRC metastasis-related transcription factors (RNA and protein levels)	Bayesian network model	The BN analysis identified LMO7 and ARL8A as potential clinical biomarkers for CRC metastasis	[71]
Gene expression data from 153 colon cancer samples and 19 normal control samples (from TCGA project)	BRPCA	The approach identified 7 molecular subtypes of colon cancer with 44 feature genes, offering a finer classification compared to previous studies	[72]
Protein-protein interaction network data for CRC	Dynamic Bayesian network	The study identified biomarkers with high accuracy and F1-scores, with Alpha-2-HS-glycoprotein identified as a dominant hub gene in CRC	[73]
Gene expression data from LS174T cell lines, normal and adenoma samples, and CRC-related samples	Naive Bayesian network	The BN model demonstrated accurate and reproducible prediction results for normal, adenoma, CRC, and related test samples, with high prediction accuracies	[74]
Gene expression data related to Wnt signaling pathway in human CRC	Static Bayesian network	The biologically inspired Bayesian models, which include epigenetic modifications, improved prediction accuracy for CRC, revealing a significant difference in the activation state of the β-catenin transcription complex between tumorous and normal samples	[75]
Registry data of patients with colon cancer from the Department of Defense Automated Central Tumor Registry	ml-BBNs	The ml-BBNs demonstrated high accuracy in predicting recurrence and mortality in colon cancer, with AUCs ranging from 0.85 to 0.90, and positive predictive values for recurrence and mortality between 78% and 84%; the model identified which high-risk patients benefit from adjuvant therapy, with the largest benefit for elderly patients with high T-stage tumors	[50]
Somatic mutation data from 906 stage II/III CRC from the VICTOR clinical trial	Bayesian network model	The BN analysis revealed significant associations between microsatellite instability, chromosomal instability, and specific mutations (TP53, KRAS, BRAF, PIK3CA, NRAS), and proposed a new molecular classification for CRC with improved prognostic capabilities, particularly for disease-free survival in certain groups	[76]
Population-based data from the SEER registry, including 146248 records of colon cancer patients	ml-BBN	The ml-BBN model accurately estimated OS with an AUC of 0.85, identifying significant prognostic factors such as age, race, tumor histology, and AJCC staging, and demonstrating improved survival predictions compared to existing models	[77]
Clinical data from 53 patients with colon carcinomatosis, including 31 clinical-pathological, treatment-related, and outcome variables	Step-wise ml-BBN	The BBN model identified three predictors of OS: Performance status, Peritoneal Cancer Index, and the ability to undergo CRS +/- HIPEC. The model achieved an AUC of 0.71, with positive and negative predictive values of 63.3% and 68.3%, respectively, and demonstrated strong classification for OS predictions	[51]
Clinical data from 278 CRC patients undergoing SLN mapping	A probabilistic Bayesian network model	The BN model predicted FN SLN mapping with an (AUC of 0.84-0.86, achieving positive and negative predictive values of 83% and 97%, respectively. The number of SLN (< 3) and tumor-replaced nodes independently predicted FN SLN	[78]
Gene expression data from cDNA arrays and clinical-pathological data of 494 CRC patients, focused on nodal metastasis prediction	A Bayesian neural network with automatic relevance determination	Tumor matrilysin was identified as a key predictor of nodal metastasis, with the Bayesian model achieving strong predictive performance, suggesting potential causality between matrilysin expression and nodal metastasis	[48]

CRC: Colorectal cancer; BN: Bayesian network; ANN: Artificial neural network; eQTLs: Expression quantitative trait loci; BKMR: Bayesian kernel machine regression; iRIGS: Integrative risk gene selector; BART: Bayesian additive regression trees; AUC: Area under curve; OS: Overall survival; CEA: Carcinoembryonic antigen; CA19-9: Carbohydrate antigen199; DBNs: Dynamic Bayesian networks; COAD: Colorectal adenocarcinoma; CCRT: Concurrent chemoradiation; CLM: Colon to liver metastasis; FFBN: Furious Bayesian Network; BRPCA: Bayesian robust principal component analysis; ml-BBNs: Machine-learned Bayesian Belief Networks; AJCC: American Joint Committee on Cancer; FN: False negative; SLN: Sentinel lymph node.

Table 6 Summary of Bayesian network applications in liver cancer research

Data type	Bayesian network algorithm	Key findings	Ref.
Radiomics features	A logistic sparsity-based feature selection model optimized using Bayesian optimization	The Bayesian optimization-based feature selection model significantly improved classification performance for HCC and other focal liver lesions, especially under limited training data conditions	[79]
Simulated concentration time curves for DCE-MRI and in vivo patient data with hepatic tumor lesions	BNN	The BNN provided more accurate parameter estimates compared to NLLS fitting and effectively identified uncertainties, particularly under high noise levels and out-of-distribution data, improving robustness for clinical applications	[84]
Genetic variation data from 33 meta-analytic studies on 45 polymorphisms across 35 genes related to HCC	BFDP	Fourteen gene polymorphisms, including CCND1, CTLA4, EGF, IL6, IL12A, KIF1B, MDM2, MICA, miR-499, MTHFR, PNPLA3, STAT4, TM6SF2, and XPD genes, were identified as significant biomarkers for HCC susceptibility	[81]
Gene expression profiles of liver tissue samples from two microarray platforms analyzed for HCC	An empirical Bayesian method	Three genes were identified as specific biomarkers for HCC diagnosis, achieving an AUC of 0.931	[85]
Single-cell multiomics data, including RNA-seq, Reduced Representation Bisulfite Sequencing, and copy number variation estimates	Bayesian network models	Best-fitted BN models identified 295 genes and provided novel insights into the mechanistic relationships of human lymphocyte antigen class I genes in HCC	[86]
miRNA and mRNA expression data from 39 HCC patients and 25 liver cirrhosis patients	A flexible Bayesian two-step integrative method	The study identified 66 significant miRNA-mRNA pairs, including molecules previously recognized as potential biomarkers in liver cancer	[82]
Multi-omics data, including genome (mutation and copy number), transcriptome, proteome, and phosphoproteome from HCC samples	A Bayesian network mixture model	The study identified three main HCC subtypes with distinct molecular characteristics, some associated with survival independent of clinical stage. Cluster-specific networks revealed connections between genotypes and molecular phenotypes	[87]
Electronic medical records from 10060 primary liver cancer patients, including TCM symptoms, signs, tongue diagnosis, and pulse diagnosis information	Bayesian network model	The Bayesian network model achieved a classification accuracy of 85.84% for syndrome diagnosis in primary liver cancer, demonstrating its effectiveness in mining nonlinear relationships in clinical data and providing reliable support for TCM-based syndrome differentiation and treatment in liver cancer	[88]
Clinical data of HCC patients, including recurrence outcomes (early, late, or no recurrence)	Bayesian network-based model	The Bayesian network model effectively distinguished between early, late, and no recurrence, significantly outperforming benchmark techniques in accuracy, precision, recall, and F-measures. It addressed the challenge of insufficient early-stage information by integrating latent variables, offering robust and reliable predictions validated across datasets, with potential implications for improving HCC recurrence management in clinical practice	[89]
Dataset of 299 HCC patients after hepatectomy, including factors like preoperative AFP level, liver function grade, tumor size, and postoperative treatment	Tree-augmented naïve Bayes algorithm	The Bayesian network model identified PVTT as the most significant predictor of survival time for HCC patients after hepatectomy. The model also highlighted the preoperative AFP level and postoperative performance of TACE as independent survival factors	[80]
Functional CT perfusion data of hepatic regions, including measurements from malignant and benign liver tissues, acquired over 590 seconds using repeated scans	A Bayesian semiparametric model	The model facilitated the clustering of liver regions based on their CT profiles, which can be used to predict and classify regions as malignant or benign, aiding in the discrimination of cancerous tissue from healthy tissue in diagnostic settings	[83]

HCC: Hepatocellular carcinoma; DCE-MRI: Dynamic contrast-enhanced magnetic resonance imaging; BNN: Bayesian neural network; NLLS: Nonlinear least squares; BFDP: Bayesian False Discovery Probability; AUC: Area under curve; BN: Bayesian network; TCM: Traditional Chinese medicine; PVTT: Portal vein tumor thrombosis; TACE: Transcatheter arterial chemoembolization; AFP: Alpha-fetoprotein; CT: Computed tomography.

Citation: Zhang MN, Xue MJ, Zhou BZ, Xu J, Sun HK, Wang JH, Wang YY. Comprehensive review of Bayesian network applications in gastrointestinal cancers. World J Clin Oncol 2025; 16(6): 104299
URL: https://www.wjgnet.com/2218-4333/full/v16/i6/104299.htm
DOI: https://dx.doi.org/10.5306/wjco.v16.i6.104299