Review
Copyright ©The Author(s) 2025.
World J Clin Oncol. Jun 24, 2025; 16(6): 104299
Published online Jun 24, 2025. doi: 10.5306/wjco.v16.i6.104299
Table 1 Six Bayesian network parameter learning algorithms
Algorithms
For incomplete datasets
Basic principle
Advantages & disadvantages
Ref.
Maximum likelihood estimateNoEstimates parameters by maximizing the likelihood function based on observed dataFast convergence; no prior knowledge used, leading to slow convergence[18]
Bayesian methodNoUses a prior distribution (often Dirichlet) and updates it with observed data to obtain a posterior distributionIncorporates prior knowledge; computationally intensive[19]
Expectation-maximizationYesEstimates parameters by iteratively applying expectation (E) and maximization (M) steps to handle missing dataEffective with missing data; can converge to local optima[20]
Robust bayesian estimateYesEstimates parameters using probability intervals to represent the ranges of conditional probabilities without assumptionsDoes not require assumptions about missing data; interval width indicates reliability of estimation[12]
Monte-Carlo methodYesUses random sampling (e.g., Gibbs sampling) to estimate the expectation of the joint probability distributionFlexible and can handle complex models; computationally expensive and convergence can be slow[21]
Table 2 Some methodologies of Bayesian network inference
Algorithm
Network type
Complexity
Accuracy
Advantages
Ref.
Variable eliminationSingle, multi-connected networksExponential in the number of variables in factorizationExactSimple, easy to use[22]
Junction treeSingle, multi-connected networksExponential in the size of the largest cliqueExactFastest method, suitable for sparse networks[22]
Differential methodSingle, multi-connected networksProportional to the complexity of differentiation operationsExactCan solve multiple problems simultaneously[23]
Stochastic samplingSingle, multi-connected networksInversely proportional to the probability of evidence variablesApproximateSimple, widely applicable, and generally effective[24]
Loopy belief propagationSingle, multi-connected networksExponential in the number of loops in the networkApproximatePerforms well when the algorithm converges[25]
Table 3 Some popular Bayesian network software tools
Tools
Language
Description
Links
Bnlearn[26]RPython package for causal discovery by learning the graphical structure of Bayesian networkshttp://www.bnlearn.com/
BNT[27]MATLABBayes net toolbox for Matlabhttps://github.com/bayesnet/bnt
GOBNILPCLearning Bayesian network structure with integer programminghttps://www.cs.york.ac.uk/aig/sw/gobnilp/
BnstructRBnstruct is an R package which learns Bayesian networks from data with missing valueshttps://cran.r-project.org/web/packages/bnstruct
BmmaloneC++This project implements a number of algorithms for learning Bayesian network structures using state space search techniques.https://github.com/bmmalone/urlearning-cpp
Causal-Learner[28]MATLABA toolbox for causal structure and Markov blanket learninghttps://github.com/z-dragonl/Causal-Learner
CausalFS[29]C/C++An open-source package of causal feature selection and causal (Bayesian network) structure learninghttps://github.com/kuiy/CausalFS
Weka[30]JavaWeka contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functionshttps://git.cms.waikato.ac.nz/weka/weka
BeneCAn exact Bayesian network structure learning software based on dynamic programminghttps://github.com/tomisilander/bene
Causal-learnPythonCausal discovery in Python. It also includes (conditional) independence tests and score functionshttps://github.com/py-why/causal-learn
pyCausalFSPythonAn open-source package of causal feature selection and causal (Bayesian network) structure learninghttps://github.com/kuiy/pyCausalFS
CausalExplorer[31]MATLABA MATLAB library of computational causal discovery and variable selection algorithmshttps://github.com/mensxmachina/CausalExplorer
PgmpyPythonPython library for learning (structure and parameter), inference (probabilistic and causal), and simulations in Bayesian networkshttps://github.com/pgmpy/pgmpy
TetradJavaIt provides algorithms the capability to discover causal models, search for models of latent structurehttps://github.com/cmu-phil/tetrad
Causal discovery toolboxPythonThe causal discovery toolbox is a package for causal inference in graphshttps://github.com/FenTechSolutions/CausalDiscoveryToolbox
DoWhy[32]PythonDoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptionshttps://github.com/py-why/dowhy
Table 4 Summary of Bayesian network applications in gastric cancer research
Data type
Bayesian network algorithm
Key findings
Ref.
Patient data from the public SEER database; patient data from a hospital cohortNaïve Bayesian modelThe Bayesian network model identified key risk factors, including age, T-stage, N-stage, tumor size, grade, and tumor location, contributing to the prediction of distant metastasis in stage T1 gastric cancer[37]
LncRNA expression profiles from 375 STAD samples in the TCGA databaseBayesian Lasso-logistic regressionThe Bayesian-based approach identified seven lncRNAs, effectively stratified STAD patients by risk, and demonstrated robust prognostic prediction accuracy with AUC values above 0.69 for 1-, 3-, and 5-year survival[37]
Survival and censorship data from 760 gastric cancer patientsA two-slice temporal Bayesian network modelThe Bayesian network improved prediction accuracy, reduced bias, and aligned with classical methods while handling high-dimensional data effectively[38]
Data from seven randomized clinical trials involving 2655 metastatic gastric cancer patientsBayesian fixed-effect network meta-analysis modelThe Bayesian analysis identified nivolumab as the optimal choice for OS in mGC patients without peritoneal metastases, providing the best balance of efficacy and safety[39]
Gene expression data from TCGA gastric cancer and metastatic gastric cancer immunotherapy clinical trial datasetsBayesian semi-nonnegative matrix trifactorization methodThe Bayesian method identified clinically relevant pathways associated with molecular subtypes and immunotherapy response, enabling patient stratification and prognosis prediction in independent validation datasets[40]
LncRNA-miRNA-disease association data, including known associations related to gastric cancerA Naïve Bayesian Classifier was integrated into a CFNBCCFNBC demonstrated reliable prediction performance (AUC of 0.8576) and successfully identified potential lncRNA-disease associations for gastric cancer in case studies[42]
Clinical data from 339 gastric cancer patientsBNNThe BNN outperformed the ANN in predicting survival of gastric cancer patients, with higher sensitivity, specificity, prediction accuracy, and AUCs[43]
Data from 245 gastric endoscopic submucosal dissectionsNaïve Bayesian modelThe Bayesian model demonstrated good discriminative power in predicting ESD outcomes, with naïve Bayesian models presenting AUCs of approximately 80% in the derivation cohort and at least 74% in cross-validation for both outcomes[44]
Data from the structural domain characteristics of the p42.3 protein moleculeBayesian network modelThe study identified the most likely acting pathway for p42.3 in gastric cancer as "S100A11" - RAGE - P38 - MAPK - Microtubule-associated protein - spindle protein - centromere protein - cell proliferation" through Bayesian probability optimizing calculation, which was subsequently validated by biological experiments[45]
Genome-wide gene expression profilesCategorical Bayesian networksThe BN approach outperformed benchmark methods and successfully identified disease-specific changes in gene regulation that differentiate cancer types, improving prediction[46]
Gene expression profile data from gastric cancer patientsA Bayesian Network was constructed using 18 genes selected by multiple logistic regressionThe constructed Bayesian Network was very similar to the network from GeneMANIA, indicating the effectiveness of the Bayesian approach in modeling the relationships among genes associated with gastric cancer subtypes[47]
Table 5 Summary of Bayesian network applications in colorectal cancer research
Data type
Bayesian network algorithm
Key findings
Ref.
Observational data on CRC, including risk factors such as alcohol consumption, smoking, diabetes, and hypertensionStructure learning algorithms combined with expert knowledge to construct BN modelThe BN model effectively segmented populations into risk subgroups and identified modifiable risk factors with significant predictive influence on CRC risk[53]
Simulation models of CRC progression and natural history, including parameters for risk factors and disease progressionBayesian calibration using Hamiltonian Monte Carlo-based algorithms integrated with ANN emulatorsThe Bayesian framework successfully calibrated CRC simulation models, accurately predicting outcomes within confidence intervals, and reduced computational complexity, enabling efficient uncertainty quantification and improved policy analysis for CRC[54]
Genetic and expression data from 275 normal colon and 276 CRC samples in the SYSCOL cohortBayesian network modelBN revealed tumor-specific (transposable elements) TE-eQTLs that influence the expression of cancer driver genes, demonstrating TEs' role in activating oncogenic pathways and providing insights into tumor-specific regulatory mechanisms[55]
Clinical data of 1253 CRC patients under 50 years of age from the Yonsei Cancer Center, encompassing 93 clinical featuresBayesian network-based synthesizing modelThe BN-based model generated a synthetic population of 5005 individuals with no significant statistical differences from the original data. Training predictive models with synthetic data improved performance, especially for small datasets[56]
Plasma concentrations of heavy metals (As, Cd, Cr, Hg, Pb) and tumor tissue NGS data from CRC patientsBKMRBKMR analysis revealed that Pb, As, and Cd were significant contributors to increased mutation rates, particularly indels. Mutational signatures showed strong correlations with heavy metal exposure, and shifts in the mutational landscape were observed between high and low exposure groups[57]
CRC-associated loci from genome-wide association studies (GWAS) and multi-omics datasetsiRIGS, a Bayesian approachThe iRIGS identified 105 high-confidence risk genes, including CEBPB, which promotes CRC cell proliferation through oncogenic pathways such as MAPK, PI3K-Akt, and Ras signaling[58]
Epidemiological data related to gut microbiome and CRC riskMultivariate Mendelian randomization analysis based on Bayesian modelNine bacteria were identified with a robust causal relationship to CRC development, including Streptococcus thermophilus, Bacteroides ovatus, and others[59]
Clinicopathologic, immune, microbial, and genomic variables from 815 stage II-III CRC patientsBARTThe BART risk model identified seven stable survival predictors and successfully stratified patients into low, intermediate, and high-risk groups with statistically significant survival differences[52]
CRC patients with poorly differentiated and moderately differentiated tumors, analyzed through fecal microbiotaRDP classifier Bayesian algorithmThe study identified distinct GM associated with poorly differentiated CRC, including high abundance of Bifidobacterium and other bacteria[60]
Colon cancer (microsatellite stable/instable stage III) samples analyzed through multi-omics data (gene expression, DNA methylation, copy number variation)IntOMICS, an integrative framework based on Bayesian networksIntOMICS successfully integrated multi-omics data and biological prior knowledge to uncover regulatory networks, revealing deeper insights into genetic information flow and identifying potential predictive biomarkers for stage III colon cancer[61]
Rectal cancer clinical data from 705 patients who underwent radical resectionTree-augmented naïve Bayes algorithmThe BN model, incorporating factors like age, CEA, CA19-9, CA125, differentiation status, T stage, N stage, KRAS mutation, and postoperative chemotherapy, showed higher accuracy (AUC = 80.11%) in predicting 3-year OS compared to a nomogram (AUC = 74.23%)[62]
Time series transcriptomic data from normal and tumor cells of colorectal tissueDBNsThe DBN-based classifier achieved high classification accuracy, revealing significant differences in gene regulatory networks between normal and tumor cells in CRC, particularly in the neighborhoods of oncogenes and cancer tissue markers[63]
Gene expression profiles of COAD tumor samples from TCGA and normal colon tissues from GTExBayesian network modelThe BN analysis identified 14 upregulated DEGs significantly correlated with tumor stages, and Cox regression highlighted tumor stage, STMN4, and FAM135B dysregulation as independent prognostic factors for COAD survival outcomes[64]
Clinical data of colon cancer patients, including 18 prognostic biomarkers and three clinical featuresBayesian binary classifiers, including a Bayesian bimodal neural network and a single modal BNN classifierThe Bayesian bimodal neural network achieved the best results in terms of AUC (0.8083), macro F1-score (0.7300), and concordance index (0.7238), demonstrating superior robustness compared to non-Bayesian models and the Bayesian single modal classifier[65]
Normal mucosa samples from 100 colon cancer patients and 50 healthy donors, including genetic variants, DNA methylation markers, and gene expression dataBayesian network modelThe BN analysis revealed that most combinations showed the canonical pathway where methylation markers cause gene expression variation (60.1%), with 33.9% showing non-causal relationships, and 6% indicating gene expression causes variation in methylation markers[66]
Genetic data from 55105 CRC cases and 65079 controls, along with an independent cohort of 101987 individuals including 1699 CRC casesLDpred, a Bayesian approachThe LDpred-derived polygenic risk score showed the highest discriminatory accuracy for CRC risk prediction, identifying 30% of individuals without a family history at similar risk to those with a family history, suggesting the potential for earlier screening[67]
Fecal microbiome samples from 45 rectal cancer patients before preoperative CCRTBayesian network modelThe BN analysis identified Duodenibacillus massiliensis as linked with an improved complete response rate after preoperative CCRT, suggesting its potential as a predictive biomarker[68]
Gene expression data from primary colon cancer and CLM samplesFast and FFBNFFBN successfully constructed gene regulatory networks for colon cancer and colon to liver metastasis, revealing unique molecular mechanisms for CLM and shared similarities with primary liver and colon cancers[69]
Gut microbiota data related to CRCBayesian networks combined with IDA (Intervention calculus when the DAG is absent)Four species-Fusobacterium, Citrobacter, Microbacterium, and Slackia-were identified as having non-null lower bounds of causal effects on CRC, supporting the role of specific microbial communities in CRC progression[70]
CRC metastasis-related transcription factors (RNA and protein levels)Bayesian network modelThe BN analysis identified LMO7 and ARL8A as potential clinical biomarkers for CRC metastasis[71]
Gene expression data from 153 colon cancer samples and 19 normal control samples (from TCGA project)BRPCAThe approach identified 7 molecular subtypes of colon cancer with 44 feature genes, offering a finer classification compared to previous studies[72]
Protein-protein interaction network data for CRCDynamic Bayesian networkThe study identified biomarkers with high accuracy and F1-scores, with Alpha-2-HS-glycoprotein identified as a dominant hub gene in CRC[73]
Gene expression data from LS174T cell lines, normal and adenoma samples, and CRC-related samplesNaive Bayesian networkThe BN model demonstrated accurate and reproducible prediction results for normal, adenoma, CRC, and related test samples, with high prediction accuracies[74]
Gene expression data related to Wnt signaling pathway in human CRCStatic Bayesian networkThe biologically inspired Bayesian models, which include epigenetic modifications, improved prediction accuracy for CRC, revealing a significant difference in the activation state of the β-catenin transcription complex between tumorous and normal samples[75]
Registry data of patients with colon cancer from the Department of Defense Automated Central Tumor Registryml-BBNsThe ml-BBNs demonstrated high accuracy in predicting recurrence and mortality in colon cancer, with AUCs ranging from 0.85 to 0.90, and positive predictive values for recurrence and mortality between 78% and 84%; the model identified which high-risk patients benefit from adjuvant therapy, with the largest benefit for elderly patients with high T-stage tumors[50]
Somatic mutation data from 906 stage II/III CRC from the VICTOR clinical trialBayesian network modelThe BN analysis revealed significant associations between microsatellite instability, chromosomal instability, and specific mutations (TP53, KRAS, BRAF, PIK3CA, NRAS), and proposed a new molecular classification for CRC with improved prognostic capabilities, particularly for disease-free survival in certain groups[76]
Population-based data from the SEER registry, including 146248 records of colon cancer patientsml-BBNThe ml-BBN model accurately estimated OS with an AUC of 0.85, identifying significant prognostic factors such as age, race, tumor histology, and AJCC staging, and demonstrating improved survival predictions compared to existing models[77]
Clinical data from 53 patients with colon carcinomatosis, including 31 clinical-pathological, treatment-related, and outcome variablesStep-wise ml-BBNThe BBN model identified three predictors of OS: Performance status, Peritoneal Cancer Index, and the ability to undergo CRS +/- HIPEC. The model achieved an AUC of 0.71, with positive and negative predictive values of 63.3% and 68.3%, respectively, and demonstrated strong classification for OS predictions[51]
Clinical data from 278 CRC patients undergoing SLN mappingA probabilistic Bayesian network modelThe BN model predicted FN SLN mapping with an (AUC of 0.84-0.86, achieving positive and negative predictive values of 83% and 97%, respectively. The number of SLN (< 3) and tumor-replaced nodes independently predicted FN SLN[78]
Gene expression data from cDNA arrays and clinical-pathological data of 494 CRC patients, focused on nodal metastasis predictionA Bayesian neural network with automatic relevance determinationTumor matrilysin was identified as a key predictor of nodal metastasis, with the Bayesian model achieving strong predictive performance, suggesting potential causality between matrilysin expression and nodal metastasis[48]
Table 6 Summary of Bayesian network applications in liver cancer research
Data type
Bayesian network algorithm
Key findings
Ref.
Radiomics featuresA logistic sparsity-based feature selection model optimized using Bayesian optimizationThe Bayesian optimization-based feature selection model significantly improved classification performance for HCC and other focal liver lesions, especially under limited training data conditions[79]
Simulated concentration time curves for DCE-MRI and in vivo patient data with hepatic tumor lesionsBNNThe BNN provided more accurate parameter estimates compared to NLLS fitting and effectively identified uncertainties, particularly under high noise levels and out-of-distribution data, improving robustness for clinical applications[84]
Genetic variation data from 33 meta-analytic studies on 45 polymorphisms across 35 genes related to HCCBFDPFourteen gene polymorphisms, including CCND1, CTLA4, EGF, IL6, IL12A, KIF1B, MDM2, MICA, miR-499, MTHFR, PNPLA3, STAT4, TM6SF2, and XPD genes, were identified as significant biomarkers for HCC susceptibility[81]
Gene expression profiles of liver tissue samples from two microarray platforms analyzed for HCCAn empirical Bayesian methodThree genes were identified as specific biomarkers for HCC diagnosis, achieving an AUC of 0.931[85]
Single-cell multiomics data, including RNA-seq, Reduced Representation Bisulfite Sequencing, and copy number variation estimatesBayesian network modelsBest-fitted BN models identified 295 genes and provided novel insights into the mechanistic relationships of human lymphocyte antigen class I genes in HCC[86]
miRNA and mRNA expression data from 39 HCC patients and 25 liver cirrhosis patientsA flexible Bayesian two-step integrative methodThe study identified 66 significant miRNA-mRNA pairs, including molecules previously recognized as potential biomarkers in liver cancer[82]
Multi-omics data, including genome (mutation and copy number), transcriptome, proteome, and phosphoproteome from HCC samplesA Bayesian network mixture modelThe study identified three main HCC subtypes with distinct molecular characteristics, some associated with survival independent of clinical stage. Cluster-specific networks revealed connections between genotypes and molecular phenotypes[87]
Electronic medical records from 10060 primary liver cancer patients, including TCM symptoms, signs, tongue diagnosis, and pulse diagnosis informationBayesian network modelThe Bayesian network model achieved a classification accuracy of 85.84% for syndrome diagnosis in primary liver cancer, demonstrating its effectiveness in mining nonlinear relationships in clinical data and providing reliable support for TCM-based syndrome differentiation and treatment in liver cancer[88]
Clinical data of HCC patients, including recurrence outcomes (early, late, or no recurrence)Bayesian network-based modelThe Bayesian network model effectively distinguished between early, late, and no recurrence, significantly outperforming benchmark techniques in accuracy, precision, recall, and F-measures. It addressed the challenge of insufficient early-stage information by integrating latent variables, offering robust and reliable predictions validated across datasets, with potential implications for improving HCC recurrence management in clinical practice[89]
Dataset of 299 HCC patients after hepatectomy, including factors like preoperative AFP level, liver function grade, tumor size, and postoperative treatmentTree-augmented naïve Bayes algorithmThe Bayesian network model identified PVTT as the most significant predictor of survival time for HCC patients after hepatectomy. The model also highlighted the preoperative AFP level and postoperative performance of TACE as independent survival factors[80]
Functional CT perfusion data of hepatic regions, including measurements from malignant and benign liver tissues, acquired over 590 seconds using repeated scansA Bayesian semiparametric modelThe model facilitated the clustering of liver regions based on their CT profiles, which can be used to predict and classify regions as malignant or benign, aiding in the discrimination of cancerous tissue from healthy tissue in diagnostic settings[83]