Copyright
©The Author(s) 2025.
World J Clin Oncol. Jun 24, 2025; 16(6): 104299
Published online Jun 24, 2025. doi: 10.5306/wjco.v16.i6.104299
Published online Jun 24, 2025. doi: 10.5306/wjco.v16.i6.104299
Table 1 Six Bayesian network parameter learning algorithms
Algorithms | For incomplete datasets | Basic principle | Advantages & disadvantages | Ref. |
Maximum likelihood estimate | No | Estimates parameters by maximizing the likelihood function based on observed data | Fast convergence; no prior knowledge used, leading to slow convergence | [18] |
Bayesian method | No | Uses a prior distribution (often Dirichlet) and updates it with observed data to obtain a posterior distribution | Incorporates prior knowledge; computationally intensive | [19] |
Expectation-maximization | Yes | Estimates parameters by iteratively applying expectation (E) and maximization (M) steps to handle missing data | Effective with missing data; can converge to local optima | [20] |
Robust bayesian estimate | Yes | Estimates parameters using probability intervals to represent the ranges of conditional probabilities without assumptions | Does not require assumptions about missing data; interval width indicates reliability of estimation | [12] |
Monte-Carlo method | Yes | Uses random sampling (e.g., Gibbs sampling) to estimate the expectation of the joint probability distribution | Flexible and can handle complex models; computationally expensive and convergence can be slow | [21] |
Table 2 Some methodologies of Bayesian network inference
Algorithm | Network type | Complexity | Accuracy | Advantages | Ref. |
Variable elimination | Single, multi-connected networks | Exponential in the number of variables in factorization | Exact | Simple, easy to use | [22] |
Junction tree | Single, multi-connected networks | Exponential in the size of the largest clique | Exact | Fastest method, suitable for sparse networks | [22] |
Differential method | Single, multi-connected networks | Proportional to the complexity of differentiation operations | Exact | Can solve multiple problems simultaneously | [23] |
Stochastic sampling | Single, multi-connected networks | Inversely proportional to the probability of evidence variables | Approximate | Simple, widely applicable, and generally effective | [24] |
Loopy belief propagation | Single, multi-connected networks | Exponential in the number of loops in the network | Approximate | Performs well when the algorithm converges | [25] |
Table 3 Some popular Bayesian network software tools
Tools | Language | Description | Links |
Bnlearn[26] | R | Python package for causal discovery by learning the graphical structure of Bayesian networks | http://www.bnlearn.com/ |
BNT[27] | MATLAB | Bayes net toolbox for Matlab | https://github.com/bayesnet/bnt |
GOBNILP | C | Learning Bayesian network structure with integer programming | https://www.cs.york.ac.uk/aig/sw/gobnilp/ |
Bnstruct | R | Bnstruct is an R package which learns Bayesian networks from data with missing values | https://cran.r-project.org/web/packages/bnstruct |
Bmmalone | C++ | This project implements a number of algorithms for learning Bayesian network structures using state space search techniques. | https://github.com/bmmalone/urlearning-cpp |
Causal-Learner[28] | MATLAB | A toolbox for causal structure and Markov blanket learning | https://github.com/z-dragonl/Causal-Learner |
CausalFS[29] | C/C++ | An open-source package of causal feature selection and causal (Bayesian network) structure learning | https://github.com/kuiy/CausalFS |
Weka[30] | Java | Weka contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functions | https://git.cms.waikato.ac.nz/weka/weka |
Bene | C | An exact Bayesian network structure learning software based on dynamic programming | https://github.com/tomisilander/bene |
Causal-learn | Python | Causal discovery in Python. It also includes (conditional) independence tests and score functions | https://github.com/py-why/causal-learn |
pyCausalFS | Python | An open-source package of causal feature selection and causal (Bayesian network) structure learning | https://github.com/kuiy/pyCausalFS |
CausalExplorer[31] | MATLAB | A MATLAB library of computational causal discovery and variable selection algorithms | https://github.com/mensxmachina/CausalExplorer |
Pgmpy | Python | Python library for learning (structure and parameter), inference (probabilistic and causal), and simulations in Bayesian networks | https://github.com/pgmpy/pgmpy |
Tetrad | Java | It provides algorithms the capability to discover causal models, search for models of latent structure | https://github.com/cmu-phil/tetrad |
Causal discovery toolbox | Python | The causal discovery toolbox is a package for causal inference in graphs | https://github.com/FenTechSolutions/CausalDiscoveryToolbox |
DoWhy[32] | Python | DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions | https://github.com/py-why/dowhy |
Table 4 Summary of Bayesian network applications in gastric cancer research
Data type | Bayesian network algorithm | Key findings | Ref. |
Patient data from the public SEER database; patient data from a hospital cohort | Naïve Bayesian model | The Bayesian network model identified key risk factors, including age, T-stage, N-stage, tumor size, grade, and tumor location, contributing to the prediction of distant metastasis in stage T1 gastric cancer | [37] |
LncRNA expression profiles from 375 STAD samples in the TCGA database | Bayesian Lasso-logistic regression | The Bayesian-based approach identified seven lncRNAs, effectively stratified STAD patients by risk, and demonstrated robust prognostic prediction accuracy with AUC values above 0.69 for 1-, 3-, and 5-year survival | [37] |
Survival and censorship data from 760 gastric cancer patients | A two-slice temporal Bayesian network model | The Bayesian network improved prediction accuracy, reduced bias, and aligned with classical methods while handling high-dimensional data effectively | [38] |
Data from seven randomized clinical trials involving 2655 metastatic gastric cancer patients | Bayesian fixed-effect network meta-analysis model | The Bayesian analysis identified nivolumab as the optimal choice for OS in mGC patients without peritoneal metastases, providing the best balance of efficacy and safety | [39] |
Gene expression data from TCGA gastric cancer and metastatic gastric cancer immunotherapy clinical trial datasets | Bayesian semi-nonnegative matrix trifactorization method | The Bayesian method identified clinically relevant pathways associated with molecular subtypes and immunotherapy response, enabling patient stratification and prognosis prediction in independent validation datasets | [40] |
LncRNA-miRNA-disease association data, including known associations related to gastric cancer | A Naïve Bayesian Classifier was integrated into a CFNBC | CFNBC demonstrated reliable prediction performance (AUC of 0.8576) and successfully identified potential lncRNA-disease associations for gastric cancer in case studies | [42] |
Clinical data from 339 gastric cancer patients | BNN | The BNN outperformed the ANN in predicting survival of gastric cancer patients, with higher sensitivity, specificity, prediction accuracy, and AUCs | [43] |
Data from 245 gastric endoscopic submucosal dissections | Naïve Bayesian model | The Bayesian model demonstrated good discriminative power in predicting ESD outcomes, with naïve Bayesian models presenting AUCs of approximately 80% in the derivation cohort and at least 74% in cross-validation for both outcomes | [44] |
Data from the structural domain characteristics of the p42.3 protein molecule | Bayesian network model | The study identified the most likely acting pathway for p42.3 in gastric cancer as "S100A11" - RAGE - P38 - MAPK - Microtubule-associated protein - spindle protein - centromere protein - cell proliferation" through Bayesian probability optimizing calculation, which was subsequently validated by biological experiments | [45] |
Genome-wide gene expression profiles | Categorical Bayesian networks | The BN approach outperformed benchmark methods and successfully identified disease-specific changes in gene regulation that differentiate cancer types, improving prediction | [46] |
Gene expression profile data from gastric cancer patients | A Bayesian Network was constructed using 18 genes selected by multiple logistic regression | The constructed Bayesian Network was very similar to the network from GeneMANIA, indicating the effectiveness of the Bayesian approach in modeling the relationships among genes associated with gastric cancer subtypes | [47] |
Table 5 Summary of Bayesian network applications in colorectal cancer research
Data type | Bayesian network algorithm | Key findings | Ref. |
Observational data on CRC, including risk factors such as alcohol consumption, smoking, diabetes, and hypertension | Structure learning algorithms combined with expert knowledge to construct BN model | The BN model effectively segmented populations into risk subgroups and identified modifiable risk factors with significant predictive influence on CRC risk | [53] |
Simulation models of CRC progression and natural history, including parameters for risk factors and disease progression | Bayesian calibration using Hamiltonian Monte Carlo-based algorithms integrated with ANN emulators | The Bayesian framework successfully calibrated CRC simulation models, accurately predicting outcomes within confidence intervals, and reduced computational complexity, enabling efficient uncertainty quantification and improved policy analysis for CRC | [54] |
Genetic and expression data from 275 normal colon and 276 CRC samples in the SYSCOL cohort | Bayesian network model | BN revealed tumor-specific (transposable elements) TE-eQTLs that influence the expression of cancer driver genes, demonstrating TEs' role in activating oncogenic pathways and providing insights into tumor-specific regulatory mechanisms | [55] |
Clinical data of 1253 CRC patients under 50 years of age from the Yonsei Cancer Center, encompassing 93 clinical features | Bayesian network-based synthesizing model | The BN-based model generated a synthetic population of 5005 individuals with no significant statistical differences from the original data. Training predictive models with synthetic data improved performance, especially for small datasets | [56] |
Plasma concentrations of heavy metals (As, Cd, Cr, Hg, Pb) and tumor tissue NGS data from CRC patients | BKMR | BKMR analysis revealed that Pb, As, and Cd were significant contributors to increased mutation rates, particularly indels. Mutational signatures showed strong correlations with heavy metal exposure, and shifts in the mutational landscape were observed between high and low exposure groups | [57] |
CRC-associated loci from genome-wide association studies (GWAS) and multi-omics datasets | iRIGS, a Bayesian approach | The iRIGS identified 105 high-confidence risk genes, including CEBPB, which promotes CRC cell proliferation through oncogenic pathways such as MAPK, PI3K-Akt, and Ras signaling | [58] |
Epidemiological data related to gut microbiome and CRC risk | Multivariate Mendelian randomization analysis based on Bayesian model | Nine bacteria were identified with a robust causal relationship to CRC development, including Streptococcus thermophilus, Bacteroides ovatus, and others | [59] |
Clinicopathologic, immune, microbial, and genomic variables from 815 stage II-III CRC patients | BART | The BART risk model identified seven stable survival predictors and successfully stratified patients into low, intermediate, and high-risk groups with statistically significant survival differences | [52] |
CRC patients with poorly differentiated and moderately differentiated tumors, analyzed through fecal microbiota | RDP classifier Bayesian algorithm | The study identified distinct GM associated with poorly differentiated CRC, including high abundance of Bifidobacterium and other bacteria | [60] |
Colon cancer (microsatellite stable/instable stage III) samples analyzed through multi-omics data (gene expression, DNA methylation, copy number variation) | IntOMICS, an integrative framework based on Bayesian networks | IntOMICS successfully integrated multi-omics data and biological prior knowledge to uncover regulatory networks, revealing deeper insights into genetic information flow and identifying potential predictive biomarkers for stage III colon cancer | [61] |
Rectal cancer clinical data from 705 patients who underwent radical resection | Tree-augmented naïve Bayes algorithm | The BN model, incorporating factors like age, CEA, CA19-9, CA125, differentiation status, T stage, N stage, KRAS mutation, and postoperative chemotherapy, showed higher accuracy (AUC = 80.11%) in predicting 3-year OS compared to a nomogram (AUC = 74.23%) | [62] |
Time series transcriptomic data from normal and tumor cells of colorectal tissue | DBNs | The DBN-based classifier achieved high classification accuracy, revealing significant differences in gene regulatory networks between normal and tumor cells in CRC, particularly in the neighborhoods of oncogenes and cancer tissue markers | [63] |
Gene expression profiles of COAD tumor samples from TCGA and normal colon tissues from GTEx | Bayesian network model | The BN analysis identified 14 upregulated DEGs significantly correlated with tumor stages, and Cox regression highlighted tumor stage, STMN4, and FAM135B dysregulation as independent prognostic factors for COAD survival outcomes | [64] |
Clinical data of colon cancer patients, including 18 prognostic biomarkers and three clinical features | Bayesian binary classifiers, including a Bayesian bimodal neural network and a single modal BNN classifier | The Bayesian bimodal neural network achieved the best results in terms of AUC (0.8083), macro F1-score (0.7300), and concordance index (0.7238), demonstrating superior robustness compared to non-Bayesian models and the Bayesian single modal classifier | [65] |
Normal mucosa samples from 100 colon cancer patients and 50 healthy donors, including genetic variants, DNA methylation markers, and gene expression data | Bayesian network model | The BN analysis revealed that most combinations showed the canonical pathway where methylation markers cause gene expression variation (60.1%), with 33.9% showing non-causal relationships, and 6% indicating gene expression causes variation in methylation markers | [66] |
Genetic data from 55105 CRC cases and 65079 controls, along with an independent cohort of 101987 individuals including 1699 CRC cases | LDpred, a Bayesian approach | The LDpred-derived polygenic risk score showed the highest discriminatory accuracy for CRC risk prediction, identifying 30% of individuals without a family history at similar risk to those with a family history, suggesting the potential for earlier screening | [67] |
Fecal microbiome samples from 45 rectal cancer patients before preoperative CCRT | Bayesian network model | The BN analysis identified Duodenibacillus massiliensis as linked with an improved complete response rate after preoperative CCRT, suggesting its potential as a predictive biomarker | [68] |
Gene expression data from primary colon cancer and CLM samples | Fast and FFBN | FFBN successfully constructed gene regulatory networks for colon cancer and colon to liver metastasis, revealing unique molecular mechanisms for CLM and shared similarities with primary liver and colon cancers | [69] |
Gut microbiota data related to CRC | Bayesian networks combined with IDA (Intervention calculus when the DAG is absent) | Four species-Fusobacterium, Citrobacter, Microbacterium, and Slackia-were identified as having non-null lower bounds of causal effects on CRC, supporting the role of specific microbial communities in CRC progression | [70] |
CRC metastasis-related transcription factors (RNA and protein levels) | Bayesian network model | The BN analysis identified LMO7 and ARL8A as potential clinical biomarkers for CRC metastasis | [71] |
Gene expression data from 153 colon cancer samples and 19 normal control samples (from TCGA project) | BRPCA | The approach identified 7 molecular subtypes of colon cancer with 44 feature genes, offering a finer classification compared to previous studies | [72] |
Protein-protein interaction network data for CRC | Dynamic Bayesian network | The study identified biomarkers with high accuracy and F1-scores, with Alpha-2-HS-glycoprotein identified as a dominant hub gene in CRC | [73] |
Gene expression data from LS174T cell lines, normal and adenoma samples, and CRC-related samples | Naive Bayesian network | The BN model demonstrated accurate and reproducible prediction results for normal, adenoma, CRC, and related test samples, with high prediction accuracies | [74] |
Gene expression data related to Wnt signaling pathway in human CRC | Static Bayesian network | The biologically inspired Bayesian models, which include epigenetic modifications, improved prediction accuracy for CRC, revealing a significant difference in the activation state of the β-catenin transcription complex between tumorous and normal samples | [75] |
Registry data of patients with colon cancer from the Department of Defense Automated Central Tumor Registry | ml-BBNs | The ml-BBNs demonstrated high accuracy in predicting recurrence and mortality in colon cancer, with AUCs ranging from 0.85 to 0.90, and positive predictive values for recurrence and mortality between 78% and 84%; the model identified which high-risk patients benefit from adjuvant therapy, with the largest benefit for elderly patients with high T-stage tumors | [50] |
Somatic mutation data from 906 stage II/III CRC from the VICTOR clinical trial | Bayesian network model | The BN analysis revealed significant associations between microsatellite instability, chromosomal instability, and specific mutations (TP53, KRAS, BRAF, PIK3CA, NRAS), and proposed a new molecular classification for CRC with improved prognostic capabilities, particularly for disease-free survival in certain groups | [76] |
Population-based data from the SEER registry, including 146248 records of colon cancer patients | ml-BBN | The ml-BBN model accurately estimated OS with an AUC of 0.85, identifying significant prognostic factors such as age, race, tumor histology, and AJCC staging, and demonstrating improved survival predictions compared to existing models | [77] |
Clinical data from 53 patients with colon carcinomatosis, including 31 clinical-pathological, treatment-related, and outcome variables | Step-wise ml-BBN | The BBN model identified three predictors of OS: Performance status, Peritoneal Cancer Index, and the ability to undergo CRS +/- HIPEC. The model achieved an AUC of 0.71, with positive and negative predictive values of 63.3% and 68.3%, respectively, and demonstrated strong classification for OS predictions | [51] |
Clinical data from 278 CRC patients undergoing SLN mapping | A probabilistic Bayesian network model | The BN model predicted FN SLN mapping with an (AUC of 0.84-0.86, achieving positive and negative predictive values of 83% and 97%, respectively. The number of SLN (< 3) and tumor-replaced nodes independently predicted FN SLN | [78] |
Gene expression data from cDNA arrays and clinical-pathological data of 494 CRC patients, focused on nodal metastasis prediction | A Bayesian neural network with automatic relevance determination | Tumor matrilysin was identified as a key predictor of nodal metastasis, with the Bayesian model achieving strong predictive performance, suggesting potential causality between matrilysin expression and nodal metastasis | [48] |
Table 6 Summary of Bayesian network applications in liver cancer research
Data type | Bayesian network algorithm | Key findings | Ref. |
Radiomics features | A logistic sparsity-based feature selection model optimized using Bayesian optimization | The Bayesian optimization-based feature selection model significantly improved classification performance for HCC and other focal liver lesions, especially under limited training data conditions | [79] |
Simulated concentration time curves for DCE-MRI and in vivo patient data with hepatic tumor lesions | BNN | The BNN provided more accurate parameter estimates compared to NLLS fitting and effectively identified uncertainties, particularly under high noise levels and out-of-distribution data, improving robustness for clinical applications | [84] |
Genetic variation data from 33 meta-analytic studies on 45 polymorphisms across 35 genes related to HCC | BFDP | Fourteen gene polymorphisms, including CCND1, CTLA4, EGF, IL6, IL12A, KIF1B, MDM2, MICA, miR-499, MTHFR, PNPLA3, STAT4, TM6SF2, and XPD genes, were identified as significant biomarkers for HCC susceptibility | [81] |
Gene expression profiles of liver tissue samples from two microarray platforms analyzed for HCC | An empirical Bayesian method | Three genes were identified as specific biomarkers for HCC diagnosis, achieving an AUC of 0.931 | [85] |
Single-cell multiomics data, including RNA-seq, Reduced Representation Bisulfite Sequencing, and copy number variation estimates | Bayesian network models | Best-fitted BN models identified 295 genes and provided novel insights into the mechanistic relationships of human lymphocyte antigen class I genes in HCC | [86] |
miRNA and mRNA expression data from 39 HCC patients and 25 liver cirrhosis patients | A flexible Bayesian two-step integrative method | The study identified 66 significant miRNA-mRNA pairs, including molecules previously recognized as potential biomarkers in liver cancer | [82] |
Multi-omics data, including genome (mutation and copy number), transcriptome, proteome, and phosphoproteome from HCC samples | A Bayesian network mixture model | The study identified three main HCC subtypes with distinct molecular characteristics, some associated with survival independent of clinical stage. Cluster-specific networks revealed connections between genotypes and molecular phenotypes | [87] |
Electronic medical records from 10060 primary liver cancer patients, including TCM symptoms, signs, tongue diagnosis, and pulse diagnosis information | Bayesian network model | The Bayesian network model achieved a classification accuracy of 85.84% for syndrome diagnosis in primary liver cancer, demonstrating its effectiveness in mining nonlinear relationships in clinical data and providing reliable support for TCM-based syndrome differentiation and treatment in liver cancer | [88] |
Clinical data of HCC patients, including recurrence outcomes (early, late, or no recurrence) | Bayesian network-based model | The Bayesian network model effectively distinguished between early, late, and no recurrence, significantly outperforming benchmark techniques in accuracy, precision, recall, and F-measures. It addressed the challenge of insufficient early-stage information by integrating latent variables, offering robust and reliable predictions validated across datasets, with potential implications for improving HCC recurrence management in clinical practice | [89] |
Dataset of 299 HCC patients after hepatectomy, including factors like preoperative AFP level, liver function grade, tumor size, and postoperative treatment | Tree-augmented naïve Bayes algorithm | The Bayesian network model identified PVTT as the most significant predictor of survival time for HCC patients after hepatectomy. The model also highlighted the preoperative AFP level and postoperative performance of TACE as independent survival factors | [80] |
Functional CT perfusion data of hepatic regions, including measurements from malignant and benign liver tissues, acquired over 590 seconds using repeated scans | A Bayesian semiparametric model | The model facilitated the clustering of liver regions based on their CT profiles, which can be used to predict and classify regions as malignant or benign, aiding in the discrimination of cancerous tissue from healthy tissue in diagnostic settings | [83] |
- Citation: Zhang MN, Xue MJ, Zhou BZ, Xu J, Sun HK, Wang JH, Wang YY. Comprehensive review of Bayesian network applications in gastrointestinal cancers. World J Clin Oncol 2025; 16(6): 104299
- URL: https://www.wjgnet.com/2218-4333/full/v16/i6/104299.htm
- DOI: https://dx.doi.org/10.5306/wjco.v16.i6.104299