1
|
Narayanan V, Bordoh LK, Kiss IZ, Li JS. Inferring networks of chemical reactions by curvature analysis of kinetic trajectories. Phys Chem Chem Phys 2025; 27:9962-9969. [PMID: 40084483 DOI: 10.1039/d4cp04338c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2025]
Abstract
Quantifying interaction networks of chemical reactions allows description, prediction, and control of a range of phenomena in chemistry and biology. The challenge lies in unambiguously assigning contributions to changes in rates from different interactions. We propose that the curvature change of kinetic trajectories due to a systematic perturbation of a node in a network can identify the coupling strength and topology. Specifically, the coupling strength can be calculated as the ratio of the curvature change measured from the coupled node and the rate change of a perturbed node. We verified the methodology in numerical simulations with a network with complex ordinary differential equations and experiments with electrochemical networks. The experiments show excellent network inference (without false positive or negative links) of various systems with large heterogeneity in local dynamics and network structure without any a priori knowledge of the kinetics. The theory and the experiments also clarify the influence of local perturbations on response amplitude and timing through network-wide up-regulation. A major advantage of our technique is its independence from hidden/unobserved nodes. This makes our method highly suitable for applications with high temporal and low spatial resolution data from interacting chemical and biochemical systems including neuronal activity monitoring with multi-electrode arrays.
Collapse
Affiliation(s)
- Vignesh Narayanan
- AI Institute, University of South Carolina, 1112 Greene St, Columbia, SC, 29208, USA
| | - Lawrence K Bordoh
- Department of Chemistry, Saint Louis University, 3501 Laclede Ave, St. Louis, MO, 63103, USA.
| | - István Z Kiss
- Department of Chemistry, Saint Louis University, 3501 Laclede Ave, St. Louis, MO, 63103, USA.
| | - Jr-Shin Li
- Department of Electrical and Systems Engineering, Washington University, 1 Brookings Dr, St. Louis, MO, 63130, USA
| |
Collapse
|
2
|
Secchettin E, Paiella S, Azzolina D, Casciani F, Salvia R, Malleo G, Gregori D. Expert Judgment Supporting a Bayesian Network to Model the Survival of Pancreatic Cancer Patients. Cancers (Basel) 2025; 17:301. [PMID: 39858083 PMCID: PMC11764457 DOI: 10.3390/cancers17020301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 01/11/2025] [Accepted: 01/14/2025] [Indexed: 01/27/2025] Open
Abstract
Purpose: Pancreatic cancer is known for its poor prognosis. The most effective treatment combines surgery with peri-operative chemotherapy. Current prognostic tools are designed to predict patient outcomes and inform treatment decisions based on collected data. Bayesian networks (BNs) can integrate objective data with subjective clinical insights, such as expert opinions, or they can be independently based on either element. This pilot study is one of the first efforts to incorporate expert opinions into a prognostic model using a Bayesian framework. Methods: A clinical hybrid BN was selected to model the long-term overall survival of pancreatic cancer patients. The SHELF expert judgment method was employed to enhance the BN's effectiveness. This approach involved a two-phase protocol: an initial single-center pilot phase followed by a definitive international phase. Results: Experts generally agreed on the distribution shape among the 12 clinically relevant predictive variables identified for the BN. However, discrepancies were noted in the tumor size, age, and ASA score nodes. With regard to expert concordance for each node, tumor size, and ASA score exhibited absolute concordance, indicating a strong consensus among experts. Ca19.9 values and resectability status showed high concordance, reflecting a solid agreement among the experts. The remaining nodes showed acceptable concordance. Conclusions: This project introduces a novel clinical hybrid Bayesian network (BN) that incorporates expert elicitation and clinical variables present at diagnosis to model the survival of pancreatic cancer patients. This model aims to provide research-based evidence for more reliable prognosis predictions and improved decision-making, addressing the limitations of existing survival prediction models. A validation process will be essential to evaluate the model's performance and clinical applicability.
Collapse
Affiliation(s)
- Erica Secchettin
- University of Verona, 37134 Verona, Italy; (S.P.); (R.S.); (G.M.)
- Department of Surgery, Dentistry, Paediatrics and Gynecology, University of Verona, 37134 Verona, Italy
| | - Salvatore Paiella
- University of Verona, 37134 Verona, Italy; (S.P.); (R.S.); (G.M.)
- Pancreatic Surgery Unit, Department of Surgery, Dentistry, Paediatrics and Gynecology, University of Verona, 37134 Verona, Italy;
| | - Danila Azzolina
- Department of Environmental and Preventive Science, University of Ferrara, 44121 Ferrara, Italy;
| | - Fabio Casciani
- Pancreatic Surgery Unit, Department of Surgery, Dentistry, Paediatrics and Gynecology, University of Verona, 37134 Verona, Italy;
| | - Roberto Salvia
- University of Verona, 37134 Verona, Italy; (S.P.); (R.S.); (G.M.)
- Pancreatic Surgery Unit, Department of Engineering for Innovation Medicine (DIMI), University of Verona, 37134 Verona, Italy
| | - Giuseppe Malleo
- University of Verona, 37134 Verona, Italy; (S.P.); (R.S.); (G.M.)
- Pancreatic Surgery Unit, Department of Surgery, Dentistry, Paediatrics and Gynecology, University of Verona, 37134 Verona, Italy;
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic and Vascular Sciences, University of Padova, 35122 Padova, Italy;
| |
Collapse
|
3
|
Khullar S, Huang X, Ramesh R, Svaren J, Wang D. NetREm: Network Regression Embeddings reveal cell-type transcription factor coordination for gene regulation. BIOINFORMATICS ADVANCES 2024; 5:vbae206. [PMID: 40260118 PMCID: PMC12011367 DOI: 10.1093/bioadv/vbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 10/22/2024] [Accepted: 12/18/2024] [Indexed: 04/23/2025]
Abstract
Motivation Transcription factor (TF) coordination plays a key role in gene regulation via direct and/or indirect protein-protein interactions (PPIs) and co-binding to regulatory elements on DNA. Single-cell technologies facilitate gene expression measurement for individual cells and cell-type identification, yet the connection between TF-TF coordination and target gene (TG) regulation of various cell types remains unclear. Results To address this, we introduce our innovative computational approach, Network Regression Embeddings (NetREm), to reveal cell-type TF-TF coordination activities for TG regulation. NetREm leverages network-constrained regularization, using prior knowledge of PPIs among TFs, to analyze single-cell gene expression data, uncovering cell-type coordinating TFs and identifying revolutionary TF-TG candidate regulatory network links. NetREm's performance is validated using simulation studies and benchmarked across several datasets in humans, mice, yeast. Further, we showcase NetREm's ability to prioritize valid novel human TF-TF coordination links in 9 peripheral blood mononuclear and 42 immune cell sub-types. We apply NetREm to examine cell-type networks in central and peripheral nerve systems (e.g. neuronal, glial, Schwann cells) and in Alzheimer's disease versus Controls. Top predictions are validated with experimental data from rat, mouse, and human models. Additional functional genomics data helps link genetic variants to our TF-TG regulatory and TF-TF coordination networks. Availability and implementation https://github.com/SaniyaKhullar/NetREm.
Collapse
Affiliation(s)
- Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53076, United States
| | - Xiang Huang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
| | - Raghu Ramesh
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Comparative Biomedical Sciences Training Program, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - John Svaren
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Comparative Biosciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53076, United States
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
4
|
Dolgov S, Savostyanov D. Tensor product algorithms for inference of contact network from epidemiological data. BMC Bioinformatics 2024; 25:285. [PMID: 39223484 PMCID: PMC11370089 DOI: 10.1186/s12859-024-05910-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
We consider a problem of inferring contact network from nodal states observed during an epidemiological process. In a black-box Bayesian optimisation framework this problem reduces to a discrete likelihood optimisation over the set of possible networks. The cardinality of this set grows combinatorially with the number of network nodes, which makes this optimisation computationally challenging. For each network, its likelihood is the probability for the observed data to appear during the evolution of the epidemiological process on this network. This probability can be very small, particularly if the network is significantly different from the ground truth network, from which the observed data actually appear. A commonly used stochastic simulation algorithm struggles to recover rare events and hence to estimate small probabilities and likelihoods. In this paper we replace the stochastic simulation with solving the chemical master equation for the probabilities of all network states. Since this equation also suffers from the curse of dimensionality, we apply tensor train approximations to overcome it and enable fast and accurate computations. Numerical simulations demonstrate efficient black-box Bayesian inference of the network.
Collapse
Affiliation(s)
- Sergey Dolgov
- University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | | |
Collapse
|
5
|
Xin J, Wang M, Qu L, Chen Q, Wang W, Wang Z. BIC-LP: A Hybrid Higher-Order Dynamic Bayesian Network Score Function for Gene Regulatory Network Reconstruction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:188-199. [PMID: 38127613 DOI: 10.1109/tcbb.2023.3345317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Reconstructing gene regulatory networks(GRNs) is an increasingly hot topic in bioinformatics. Dynamic Bayesian network(DBN) is a stochastic graph model commonly used as a vital model for GRN reconstruction. But probabilistic characteristics of biological networks and the existence of data noise bring great challenges to GRN reconstruction and always lead to many false positive/negative edges. ScoreLasso is a hybrid DBN score function combining DBN and linear regression with good performance. Its performance is, however, limited by first-order assumption and ignorance of the initial network of DBN. In this article, an integrated model based on higher-order DBN model, higher-order Lasso linear regression model and Pearson correlation model is proposed. Based on this, a hybrid higher-order DBN score function for GRN reconstruction is proposed, namely BIC-LP. BIC-LP score function is constructed by adding terms based on Lasso linear regression coefficients and Pearson correlation coefficients on classical BIC score function. Therefore, it could capture more information from dataset and curb information loss, compared with both many existing Bayesian family score functions and many state-of-the-art methods for GRN reconstruction. Experimental results show that BIC-LP can reasonably eliminate some false positive edges while retaining most true positive edges, so as to achieve better GRN reconstruction performance.
Collapse
|
6
|
Wang Y, Lee H, Fear JM, Berger I, Oliver B, Przytycka TM. NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022; 5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open
Abstract
The inference of Gene Regulatory Networks (GRNs) is one of the key challenges in systems biology. Leading algorithms utilize, in addition to gene expression, prior knowledge such as Transcription Factor (TF) DNA binding motifs or results of TF binding experiments. However, such prior knowledge is typically incomplete, therefore, integrating it with gene expression to infer GRNs remains difficult. To address this challenge, we introduce NetREX-CF-Regulatory Network Reconstruction using EXpression and Collaborative Filtering-a GRN reconstruction approach that brings together Collaborative Filtering to address the incompleteness of the prior knowledge and a biologically justified model of gene expression (sparse Network Component Analysis based model). We validated the NetREX-CF using Yeast data and then used it to construct the GRN for Drosophila Schneider 2 (S2) cells. To corroborate the GRN, we performed a large-scale RNA-Seq analysis followed by a high-throughput RNAi treatment against all 465 expressed TFs in the cell line. Our knockdown result has not only extensively validated the GRN we built, but also provides a benchmark that our community can use for evaluating GRNs. Finally, we demonstrate that NetREX-CF can infer GRNs using single-cell RNA-Seq, and outperforms other methods, by using previously published human data.
Collapse
Affiliation(s)
- Yijie Wang
- Computer Science Department, Indiana University, Bloomington, IN, 47408, USA.
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin M Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Isabelle Berger
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
7
|
Zhou J, Hoen AG, Mcritchie S, Pathmasiri W, Viles WD, Nguyen QP, Madan JC, Dade E, Karagas MR, Gui J. Information enhanced model selection for Gaussian graphical model with application to metabolomic data. Biostatistics 2022; 23:926-948. [PMID: 33720330 PMCID: PMC9608647 DOI: 10.1093/biostatistics/kxab006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 11/12/2022] Open
Abstract
In light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to learn the structure of association networks using Gaussian graphical models combined with prior knowledge. Our strategy includes two parts. In the first part, we propose a model selection criterion called structural Bayesian information criterion, in which the prior structure is modeled and incorporated into Bayesian information criterion. It is shown that the popular extended Bayesian information criterion is a special case of structural Bayesian information criterion. In the second part, we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions structural Bayesian information criterion is a consistent model selection criterion for high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the proposed algorithm over the existing ones and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiological cohort study validates that metabolic pathway involvement is a statistically significant factor for the conditional dependence between metabolites. Furthermore, new relationships among metabolites are discovered which can not be identified by the conventional methods of pathway analysis. Some of them have been widely recognized in biological literature.
Collapse
Affiliation(s)
- Jie Zhou
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Anne G Hoen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, 3 Rope Ferry Road, Hanover, NH 03755, USA
| | - Susan Mcritchie
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Wimal Pathmasiri
- Nutrition Research Institute, Department of Nutrition, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, 500 Laureate Way, Kannapolis, NC 28081, USA
| | - Weston D Viles
- Department of Mathematics and Statistics, University of Southern Maine, 96 Falmouth St, Portland, ME 04103, USA
| | - Quang P Nguyen
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA and Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Juliette C Madan
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Erika Dade
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Margaret R Karagas
- Depatment of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Jiang Gui
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| |
Collapse
|
8
|
|
9
|
Voutsa V, Battaglia D, Bracken LJ, Brovelli A, Costescu J, Díaz Muñoz M, Fath BD, Funk A, Guirro M, Hein T, Kerschner C, Kimmich C, Lima V, Messé A, Parsons AJ, Perez J, Pöppl R, Prell C, Recinos S, Shi Y, Tiwari S, Turnbull L, Wainwright J, Waxenecker H, Hütt MT. Two classes of functional connectivity in dynamical processes in networks. J R Soc Interface 2021; 18:20210486. [PMID: 34665977 PMCID: PMC8526174 DOI: 10.1098/rsif.2021.0486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 09/13/2021] [Indexed: 12/12/2022] Open
Abstract
The relationship between network structure and dynamics is one of the most extensively investigated problems in the theory of complex systems of recent years. Understanding this relationship is of relevance to a range of disciplines-from neuroscience to geomorphology. A major strategy of investigating this relationship is the quantitative comparison of a representation of network architecture (structural connectivity, SC) with a (network) representation of the dynamics (functional connectivity, FC). Here, we show that one can distinguish two classes of functional connectivity-one based on simultaneous activity (co-activity) of nodes, the other based on sequential activity of nodes. We delineate these two classes in different categories of dynamical processes-excitations, regular and chaotic oscillators-and provide examples for SC/FC correlations of both classes in each of these models. We expand the theoretical view of the SC/FC relationships, with conceptual instances of the SC and the two classes of FC for various application scenarios in geomorphology, ecology, systems biology, neuroscience and socio-ecological systems. Seeing the organisation of dynamical processes in a network either as governed by co-activity or by sequential activity allows us to bring some order in the myriad of observations relating structure and function of complex networks.
Collapse
Affiliation(s)
- Venetia Voutsa
- Department of Life Sciences and Chemistry, Jacobs University Bremen, 28759 Bremen, Germany
| | - Demian Battaglia
- Aix-Marseille Université, Inserm, Institut de Neurosciences des Systèmes (UMR 1106), Marseille, France
- University of Strasbourg Institute for Advanced Studies (USIAS), Strasbourg 67083, France
| | | | - Andrea Brovelli
- Aix-Marseille Université, CNRS, Institut de Neurosciences de la Timone (UMR 7289), Marseille, France
| | - Julia Costescu
- Department of Geography, Durham University, Durham DH1 3LE, UK
| | - Mario Díaz Muñoz
- Department of Sustainability, Governance and Methods, Modul University Vienna, 1190 Vienna, Austria
| | - Brian D. Fath
- Department of Biological Sciences, Towson University, Towson, Maryland 21252, USA
- Advancing Systems Analysis Program, International Institute for Applied Systems Analysis, Laxenburg 2361, Austria
- Department of Environmental Studies, Masaryk University, 60200 Brno, Czech Republic
| | - Andrea Funk
- Institute of Hydrobiology and Aquatic Ecosystem Management (IHG), University of Natural Resources and Life Sciences Vienna (BOKU), 1180 Vienna, Austria
- WasserCluster Lunz - Biologische Station GmbH, Dr. Carl Kupelwieser Promenade 5, 3293 Lunz am See, Austria
| | - Mel Guirro
- Department of Geography, Durham University, Durham DH1 3LE, UK
| | - Thomas Hein
- Institute of Hydrobiology and Aquatic Ecosystem Management (IHG), University of Natural Resources and Life Sciences Vienna (BOKU), 1180 Vienna, Austria
- WasserCluster Lunz - Biologische Station GmbH, Dr. Carl Kupelwieser Promenade 5, 3293 Lunz am See, Austria
| | - Christian Kerschner
- Department of Sustainability, Governance and Methods, Modul University Vienna, 1190 Vienna, Austria
- Department of Environmental Studies, Masaryk University, 60200 Brno, Czech Republic
| | - Christian Kimmich
- Department of Environmental Studies, Masaryk University, 60200 Brno, Czech Republic
- Regional Science and Environmental Research, Institute for Advanced Studies, 1080 Vienna, Austria
| | - Vinicius Lima
- Aix-Marseille Université, Inserm, Institut de Neurosciences des Systèmes (UMR 1106), Marseille, France
- Aix-Marseille Université, CNRS, Institut de Neurosciences de la Timone (UMR 7289), Marseille, France
| | - Arnaud Messé
- Department of Computational Neuroscience, University Medical Center Eppendorf, Hamburg University, Germany
| | | | - John Perez
- Department of Geography, Durham University, Durham DH1 3LE, UK
| | - Ronald Pöppl
- Department of Geography and Regional Research, University of Vienna, Universitätsstr. 7, 1010 Vienna, Austria
| | - Christina Prell
- Department of Cultural Geography, University of Groningen, 9747 AD, Groningen, The Netherlands
| | - Sonia Recinos
- Institute of Hydrobiology and Aquatic Ecosystem Management (IHG), University of Natural Resources and Life Sciences Vienna (BOKU), 1180 Vienna, Austria
| | - Yanhua Shi
- Department of Environmental Studies, Masaryk University, 60200 Brno, Czech Republic
| | - Shubham Tiwari
- Department of Geography, Durham University, Durham DH1 3LE, UK
| | - Laura Turnbull
- Department of Geography, Durham University, Durham DH1 3LE, UK
| | - John Wainwright
- Department of Geography, Durham University, Durham DH1 3LE, UK
| | - Harald Waxenecker
- Department of Environmental Studies, Masaryk University, 60200 Brno, Czech Republic
| | - Marc-Thorsten Hütt
- Department of Life Sciences and Chemistry, Jacobs University Bremen, 28759 Bremen, Germany
| |
Collapse
|
10
|
Liu E, Li J, Kinnebrew GH, Zhang P, Zhang Y, Cheng L, Li L. A Fast and Furious Bayesian Network and Its Application of Identifying Colon Cancer to Liver Metastasis Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1325-1335. [PMID: 31581091 DOI: 10.1109/tcbb.2019.2944826] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Bayesian networks is a powerful method for identifying causal relationships among variables. However, as the network size increases, the time complexity of searching the optimal structure grows exponentially. We proposed a novel search algorithm - Fast and Furious Bayesian Network (FFBN). Compared to the existing greedy search algorithm, FFBN uses significantly fewer model configuration rules to determine the causal direction of edges when constructing the Bayesian network, which leads to greatly improved computational speed. We benchmarked the performance of FFBN by reconstructing gene regulatory networks (GRNs) from two DREAM5 challenge datasets: a synthetic dataset and a larger yeast transcriptome dataset. In both datasets, FFBN shows a much faster speed than the existing greedy search algorithm, while maintaining equally good or better performance in recall and precision. We then constructed three whole transcriptome GRNs for primary liver cancer (PL), primary colon cancer (PC) and colon to liver metastasis (CLM) expression data, which the existing greedy search algorithms failed. Three GRNs contain 12,099 common genes. Unprecedentedly, our newly developed FFBN algorithm is able to build up GRNs at a scale larger than 10,000 genes. Using FFBN, we discovered that CLM has its unique cancer molecular mechanisms and shares a certain degree of similarity with both PL and PC.
Collapse
|
11
|
Ni Y, Baladandayuthapani V, Vannucci M, Stingo FC. Bayesian graphical models for modern biological applications. STAT METHOD APPL-GER 2021; 31:197-225. [PMID: 35673326 PMCID: PMC9165295 DOI: 10.1007/s10260-021-00572-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/16/2021] [Indexed: 12/14/2022]
Abstract
Graphical models are powerful tools that are regularly used to investigate complex dependence structures in high-throughput biomedical datasets. They allow for holistic, systems-level view of the various biological processes, for intuitive and rigorous understanding and interpretations. In the context of large networks, Bayesian approaches are particularly suitable because it encourages sparsity of the graphs, incorporate prior information, and most importantly account for uncertainty in the graph structure. These features are particularly important in applications with limited sample size, including genomics and imaging studies. In this paper, we review several recently developed techniques for the analysis of large networks under non-standard settings, including but not limited to, multiple graphs for data observed from multiple related subgroups, graphical regression approaches used for the analysis of networks that change with covariates, and other complex sampling and structural settings. We also illustrate the practical utility of some of these methods using examples in cancer genomics and neuroimaging.
Collapse
Affiliation(s)
- Yang Ni
- Department of Statistics, Texas A&M University, College Station, USA
| | | | | | - Francesco C. Stingo
- Department of Statistics, Computer Science, Applications “G. Parenti”, The University of Florence, Florence, Italy
| |
Collapse
|
12
|
Hütt MT, Lesne A. Gene Regulatory Networks: Dissecting Structure and Dynamics. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11467-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
13
|
Affiliation(s)
- Simón Lunagómez
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| | - Sofia C. Olhede
- Institute of Mathematics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Department of Statistical Science, UCL, London, UK
| | - Patrick J. Wolfe
- Department of Statistics, Purdue University, West Lafayette, IN
- Department of Computer Science, Purdue University, West Lafayette, IN
- Department of Electrical & Computer Engineering, Purdue University, West Lafayette, IN
| |
Collapse
|
14
|
Holding AN, Cook HV, Markowetz F. Data generation and network reconstruction strategies for single cell transcriptomic profiles of CRISPR-mediated gene perturbations. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194441. [PMID: 31756390 DOI: 10.1016/j.bbagrm.2019.194441] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 10/01/2019] [Accepted: 10/01/2019] [Indexed: 02/05/2023]
Abstract
Recent advances in single-cell RNA-sequencing (scRNA-seq) in combination with CRISPR/Cas9 technologies have enabled the development of methods for large-scale perturbation studies with transcriptional readouts. These methods are highly scalable and have the potential to provide a wealth of information on the biological networks that underlie cellular response. Here we discuss how to overcome several key challenges to generate and analyse data for the confident reconstruction of models of the underlying cellular network. Some challenges are generic, and apply to analysing any single-cell transcriptomic data, while others are specific to combined single-cell CRISPR/Cas9 data, in particular barcode swapping, knockdown efficiency, multiplicity of infection and potential confounding factors. We also provide a curated collection of published data sets to aid the development of analysis strategies. Finally, we discuss several network reconstruction approaches, including co-expression networks and Bayesian networks, as well as their limitations, and highlight the potential of Nested Effects Models for network reconstruction from scRNA-seq data. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Andrew N Holding
- Department of Biology, University of York, York, UK; York Biomedical Research Institute, University of York, York, UK; CRUK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, UK; The Alan Turing Institute, 96 Euston Road, Kings Cross, London, UK
| | - Helen V Cook
- Department of Biology, University of York, York, UK
| | | |
Collapse
|
15
|
Affiliation(s)
- Marco Scutari
- Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA) Manno Switzerland
| |
Collapse
|
16
|
Wang L, Audenaert P, Michoel T. High-Dimensional Bayesian Network Inference From Systems Genetics Data Using Genetic Node Ordering. Front Genet 2019; 10:1196. [PMID: 31921278 PMCID: PMC6933017 DOI: 10.3389/fgene.2019.01196] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Accepted: 10/29/2019] [Indexed: 11/23/2022] Open
Abstract
Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.
Collapse
Affiliation(s)
- Lingfei Wang
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, United States
| | - Pieter Audenaert
- IDLab, Ghent University—imec, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| | - Tom Michoel
- Division of Genetics and Genomics, The Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, United Kingdom
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
17
|
Scutari M, Graafland CE, Gutiérrez JM. Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms. Int J Approx Reason 2019. [DOI: 10.1016/j.ijar.2019.10.003] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
18
|
Wani N, Raza K. Integrative approaches to reconstruct regulatory networks from multi-omics data: A review of state-of-the-art methods. Comput Biol Chem 2019; 83:107120. [PMID: 31499298 DOI: 10.1016/j.compbiolchem.2019.107120] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 02/22/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Data generation using high throughput technologies has led to the accumulation of diverse types of molecular data. These data have different types (discrete, real, string, etc.) and occur in various formats and sizes. Datasets including gene expression, miRNA expression, protein-DNA binding data (ChIP-Seq/ChIP-ChIP), mutation data (copy number variation, single nucleotide polymorphisms), annotations, interactions, and association data are some of the commonly used biological datasets to study various cellular mechanisms of living organisms. Each of them provides a unique, complementary and partly independent view of the genome and hence embed essential information about the regulatory mechanisms of genes and their products. Therefore, integrating these data and inferring regulatory interactions from them offer a system level of biological insight in predicting gene functions and their phenotypic outcomes. To study genome functionality through regulatory networks, different methods have been proposed for collective mining of information from an integrated dataset. We survey here integration methods that reconstruct regulatory networks using state-of-the-art techniques to handle multi-omics (i.e., genomic, transcriptomic, proteomic) and other biological datasets.
Collapse
Affiliation(s)
- Nisar Wani
- Govt. Degree College Baramulla, J & K, India; Department of Computer Science, jamia Milia Islamia, New Delhi, India
| | - Khalid Raza
- Department of Computer Science, jamia Milia Islamia, New Delhi, India.
| |
Collapse
|
19
|
Cremaschi A, Argiento R, Shoemaker K, Peterson C, Vannucci M. Hierarchical Normalized Completely Random Measures for Robust Graphical Modeling. BAYESIAN ANALYSIS 2019; 14:1271-1301. [PMID: 32431780 PMCID: PMC7237071 DOI: 10.1214/19-ba1153] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate t-distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet t-distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas.
Collapse
Affiliation(s)
- Andrea Cremaschi
- Department of Cancer Immunology, Institute of Cancer Research, Oslo University Hospital, Oslo, Norway
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway
| | - Raffaele Argiento
- ESOMAS Department, University of Torino, Torino, Italy
- Collegio Carlo Alberto, Torino, Italy
| | - Katherine Shoemaker
- Department of Statistics, Rice University, Houston, TX, USA
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Christine Peterson
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
20
|
Kaufmann T, Castela Forte J, Hiemstra B, Wiering MA, Grzegorczyk M, Epema AH, van der Horst ICC. A Bayesian Network Analysis of the Diagnostic Process and Its Accuracy to Determine How Clinicians Estimate Cardiac Function in Critically Ill Patients: Prospective Observational Cohort Study. JMIR Med Inform 2019; 7:e15358. [PMID: 31670697 PMCID: PMC6913745 DOI: 10.2196/15358] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 09/17/2019] [Accepted: 09/23/2019] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Hemodynamic assessment of critically ill patients is a challenging endeavor, and advanced monitoring techniques are often required to guide treatment choices. Given the technical complexity and occasional unavailability of these techniques, estimation of cardiac function based on clinical examination is valuable for critical care physicians to diagnose circulatory shock. Yet, the lack of knowledge on how to best conduct and teach the clinical examination to estimate cardiac function has reduced its accuracy to almost that of "flipping a coin." OBJECTIVE The aim of this study was to investigate the decision-making process underlying estimates of cardiac function of patients acutely admitted to the intensive care unit (ICU) based on current standardized clinical examination using Bayesian methods. METHODS Patient data were collected as part of the Simple Intensive Care Studies-I (SICS-I) prospective cohort study. All adult patients consecutively admitted to the ICU with an expected stay longer than 24 hours were included, for whom clinical examination was conducted and cardiac function was estimated. Using these data, first, the probabilistic dependencies between the examiners' estimates and the set of clinically measured variables upon which these rely were analyzed using a Bayesian network. Second, the accuracy of cardiac function estimates was assessed by comparison to the cardiac index values measured by critical care ultrasonography. RESULTS A total of 1075 patients were included, of which 783 patients had validated cardiac index measurements. A Bayesian network analysis identified two clinical variables upon which cardiac function estimate is conditionally dependent, namely, noradrenaline administration and presence of delayed capillary refill time or mottling. When the patient received noradrenaline, the probability of cardiac function being estimated as reasonable or good P(ER,G) was lower, irrespective of whether the patient was mechanically ventilated (P[ER,G|ventilation, noradrenaline]=0.63, P[ER,G|ventilation, no noradrenaline]=0.91, P[ER,G|no ventilation, noradrenaline]=0.67, P[ER,G|no ventilation, no noradrenaline]=0.93). The same trend was found for capillary refill time or mottling. Sensitivity of estimating a low cardiac index was 26% and 39% and specificity was 83% and 74% for students and physicians, respectively. Positive and negative likelihood ratios were 1.53 (95% CI 1.19-1.97) and 0.87 (95% CI 0.80-0.95), respectively, overall. CONCLUSIONS The conditional dependencies between clinical variables and the cardiac function estimates resulted in a network consistent with known physiological relations. Conditional probability queries allow for multiple clinical scenarios to be recreated, which provide insight into the possible thought process underlying the examiners' cardiac function estimates. This information can help develop interactive digital training tools for students and physicians and contribute toward the goal of further improving the diagnostic accuracy of clinical examination in ICU patients. TRIAL REGISTRATION ClinicalTrials.gov NCT02912624; https://clinicaltrials.gov/ct2/show/NCT02912624.
Collapse
Affiliation(s)
- Thomas Kaufmann
- Department of Anesthesiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - José Castela Forte
- Department of Anesthesiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Department of Critical Care, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Department of Clinical Pharmacology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, Netherlands
| | - Bart Hiemstra
- Department of Anesthesiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Department of Critical Care, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Marco A Wiering
- Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, Netherlands
| | - Marco Grzegorczyk
- Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen, Groningen, Netherlands
| | - Anne H Epema
- Department of Anesthesiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Iwan C C van der Horst
- Department of Critical Care, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.,Department of Intensive Care, Maastricht University Medical Center+, Maastricht University, Maastricht, Netherlands
| | | |
Collapse
|
21
|
|
22
|
de Campos LM, Cano A, Castellano JG, Moral S. Combining gene expression data and prior knowledge for inferring gene regulatory networks via Bayesian networks using structural restrictions. Stat Appl Genet Mol Biol 2019; 18:sagmb-2018-0042. [PMID: 31042646 DOI: 10.1515/sagmb-2018-0042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Gene Regulatory Networks (GRNs) are known as the most adequate instrument to provide a clear insight and understanding of the cellular systems. One of the most successful techniques to reconstruct GRNs using gene expression data is Bayesian networks (BN) which have proven to be an ideal approach for heterogeneous data integration in the learning process. Nevertheless, the incorporation of prior knowledge has been achieved by using prior beliefs or by using networks as a starting point in the search process. In this work, the utilization of different kinds of structural restrictions within algorithms for learning BNs from gene expression data is considered. These restrictions will codify prior knowledge, in such a way that a BN should satisfy them. Therefore, one aim of this work is to make a detailed review on the use of prior knowledge and gene expression data to inferring GRNs from BNs, but the major purpose in this paper is to research whether the structural learning algorithms for BNs from expression data can achieve better outcomes exploiting this prior knowledge with the use of structural restrictions. In the experimental study, it is shown that this new way to incorporate prior knowledge leads us to achieve better reverse-engineered networks.
Collapse
Affiliation(s)
- Luis M de Campos
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Andrés Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Javier G Castellano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Serafín Moral
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
23
|
Boluki S, Esfahani MS, Qian X, Dougherty ER. Constructing Pathway-Based Priors within a Gaussian Mixture Model for Bayesian Regression and Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:524-537. [PMID: 29990066 DOI: 10.1109/tcbb.2017.2778715] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Gene-expression-based classification and regression are major concerns in translational genomics. If the feature-label distribution is known, then an optimal classifier can be derived. If the predictor-target distribution is known, then an optimal regression function can be derived. In practice, neither is known, data must be employed, and, for small samples, prior knowledge concerning the feature-label or predictor-target distribution can be used in the learning process. Optimal Bayesian classification and optimal Bayesian regression provide optimality under uncertainty. With optimal Bayesian classification (or regression), uncertainty is treated directly on the feature-label (or predictor-target) distribution. The fundamental engineering problem is prior construction. The Regularized Expected Mean Log-Likelihood Prior (REMLP) utilizes pathway information and provides viable priors for the feature-label distribution, assuming that the training data contain labels. In practice, the labels may not be observed. This paper extends the REMLP methodology to a Gaussian mixture model (GMM) when the labels are unknown. Prior construction bundled with prior update via Bayesian sampling results in Monte Carlo approximations to the optimal Bayesian regression function and optimal Bayesian classifier. Simulations demonstrate that the GMM REMLP prior yields better performance than the EM algorithm for small data sets. We apply it to phenotype classification when the prior knowledge consists of colon cancer pathways.
Collapse
|
24
|
Tan Q, Liu Y, Liu J. Motif-aware diffusion network inference. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2018. [DOI: 10.1007/s41060-018-0156-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
25
|
Reconstructing phosphorylation signalling networks from quantitative phosphoproteomic data. Essays Biochem 2018; 62:525-534. [PMID: 30072490 PMCID: PMC6204553 DOI: 10.1042/ebc20180019] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 06/25/2018] [Accepted: 06/26/2018] [Indexed: 12/25/2022]
Abstract
Cascades of phosphorylation between protein kinases comprise a core mechanism in the integration and propagation of intracellular signals. Although we have accumulated a wealth of knowledge around some such pathways, this is subject to study biases and much remains to be uncovered. Phosphoproteomics, the identification and quantification of phosphorylated proteins on a proteomic scale, provides a high-throughput means of interrogating the state of intracellular phosphorylation, both at the pathway level and at the whole-cell level. In this review, we discuss methods for using human quantitative phosphoproteomic data to reconstruct the underlying signalling networks that generated it. We address several challenges imposed by the data on such analyses and we consider promising advances towards reconstructing unbiased, kinome-scale signalling networks.
Collapse
|
26
|
Wang Y, Cho DY, Lee H, Fear J, Oliver B, Przytycka TM. Reprogramming of regulatory network using expression uncovers sex-specific gene regulation in Drosophila. Nat Commun 2018; 9:4061. [PMID: 30283019 PMCID: PMC6170494 DOI: 10.1038/s41467-018-06382-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 08/13/2018] [Indexed: 02/07/2023] Open
Abstract
Gene regulatory networks (GRNs) describe regulatory relationships between transcription factors (TFs) and their target genes. Computational methods to infer GRNs typically combine evidence across different conditions to infer context-agnostic networks. We develop a method, Network Reprogramming using EXpression (NetREX), that constructs a context-specific GRN given context-specific expression data and a context-agnostic prior network. NetREX remodels the prior network to obtain the topology that provides the best explanation for expression data. Because NetREX utilizes prior network topology, we also develop PriorBoost, a method that evaluates a prior network in terms of its consistency with the expression data. We validate NetREX and PriorBoost using the "gold standard" E. coli GRN from the DREAM5 network inference challenge and apply them to construct sex-specific Drosophila GRNs. NetREX constructed sex-specific Drosophila GRNs that, on all applied measures, outperform networks obtained from other methods indicating that NetREX is an important milestone toward building more accurate GRNs.
Collapse
Affiliation(s)
- Yijie Wang
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Dong-Yeon Cho
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Hangnoh Lee
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Justin Fear
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA
| | - Brian Oliver
- Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, 50 South Drive, Bethesda, MD, 20892, USA.
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.
| |
Collapse
|
27
|
Siahpirani AF, Roy S. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Res 2018; 45:e21. [PMID: 27794550 PMCID: PMC5389674 DOI: 10.1093/nar/gkw963] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 10/12/2016] [Indexed: 12/16/2022] Open
Abstract
Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization.
Collapse
Affiliation(s)
- Alireza F Siahpirani
- Department of Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St. Madison, WI 53706-1613, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Discovery Building 330 North Orchard St. Madison, WI 53715, USA.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, K6/446 Clinical Sciences Center 600 Highland Avenue Madison, WI 53792-4675, USA
| |
Collapse
|
28
|
Balasubramanian JB, Gopalakrishnan V. Tunable structure priors for Bayesian rule learning for knowledge integrated biomarker discovery. World J Clin Oncol 2018; 9:98-109. [PMID: 30254965 PMCID: PMC6153126 DOI: 10.5306/wjco.v9.i5.98] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 07/24/2018] [Accepted: 08/05/2018] [Indexed: 02/06/2023] Open
Abstract
AIM To develop a framework to incorporate background domain knowledge into classification rule learning for knowledge discovery in biomedicine.
METHODS Bayesian rule learning (BRL) is a rule-based classifier that uses a greedy best-first search over a space of Bayesian belief-networks (BN) to find the optimal BN to explain the input dataset, and then infers classification rules from this BN. BRL uses a Bayesian score to evaluate the quality of BNs. In this paper, we extended the Bayesian score to include informative structure priors, which encodes our prior domain knowledge about the dataset. We call this extension of BRL as BRLp. The structure prior has a λ hyperparameter that allows the user to tune the degree of incorporation of the prior knowledge in the model learning process. We studied the effect of λ on model learning using a simulated dataset and a real-world lung cancer prognostic biomarker dataset, by measuring the degree of incorporation of our specified prior knowledge. We also monitored its effect on the model predictive performance. Finally, we compared BRLp to other state-of-the-art classifiers commonly used in biomedicine.
RESULTS We evaluated the degree of incorporation of prior knowledge into BRLp, with simulated data by measuring the Graph Edit Distance between the true data-generating model and the model learned by BRLp. We specified the true model using informative structure priors. We observed that by increasing the value of λ we were able to increase the influence of the specified structure priors on model learning. A large value of λ of BRLp caused it to return the true model. This also led to a gain in predictive performance measured by area under the receiver operator characteristic curve (AUC). We then obtained a publicly available real-world lung cancer prognostic biomarker dataset and specified a known biomarker from literature [the epidermal growth factor receptor (EGFR) gene]. We again observed that larger values of λ led to an increased incorporation of EGFR into the final BRLp model. This relevant background knowledge also led to a gain in AUC.
CONCLUSION BRLp enables tunable structure priors to be incorporated during Bayesian classification rule learning that integrates data and knowledge as demonstrated using lung cancer biomarker data.
Collapse
Affiliation(s)
- Jeya Balaji Balasubramanian
- Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15206, United States
| |
Collapse
|
29
|
Franks AM, Markowetz F, Airoldi EM. REFINING CELLULAR PATHWAY MODELS USING AN ENSEMBLE OF HETEROGENEOUS DATA SOURCES. Ann Appl Stat 2018; 12:1361-1384. [PMID: 36506698 PMCID: PMC9733905 DOI: 10.1214/16-aoas915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Improving current models and hypotheses of cellular pathways is one of the major challenges of systems biology and functional genomics. There is a need for methods to build on established expert knowledge and reconcile it with results of new high-throughput studies. Moreover, the available sources of data are heterogeneous, and the data need to be integrated in different ways depending on which part of the pathway they are most informative for. In this paper, we introduce a compartment specific strategy to integrate edge, node and path data for refining a given network hypothesis. To carry out inference, we use a local-move Gibbs sampler for updating the pathway hypothesis from a compendium of heterogeneous data sources, and a new network regression idea for integrating protein attributes. We demonstrate the utility of this approach in a case study of the pheromone response MAPK pathway in the yeast S. cerevisiae.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and, Applied Probability, University of California, Santa Barbara, South Hall, Santa Barbara, California 93106, USA
| | - Florian Markowetz
- Cancer Research UK, Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, United Kingdom
| | - Edoardo M Airoldi
- Fox School of Business, Department of Statistical Science, Temple University, Center for Data Science, 1810 Liacouras Walk, Philadelphia, Pennsylvania 19122, USA
| |
Collapse
|
30
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 253] [Impact Index Per Article: 36.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
31
|
Santra T, Rukhlenko O, Zhernovkov V, Kholodenko BN. Reconstructing static and dynamic models of signaling pathways using Modular Response Analysis. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.coisb.2018.02.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
32
|
Skinkyte-Juskiene R, Kogelman LJ, Kadarmideen HN. Transcription Factor Co-expression Networks of Adipose RNA-Seq Data Reveal Regulatory Mechanisms of Obesity. Curr Genomics 2018; 19:289-299. [PMID: 29755291 PMCID: PMC5930450 DOI: 10.2174/1389202918666171005095059] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 05/28/2017] [Accepted: 09/07/2017] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Transcription Factors (TFs) control actuation of genes in the genome and are key mediators of complex processes such as obesity. Master Regulators (MRs) are the genes at the top of a regulation hierarchy which regulate other genes. OBJECTIVE To elucidate clusters of highly co-expressed TFs (modules), involved pathways, highly inter-connected TFs (hub-TFs) and MRs leading to obesity and leanness, using porcine model for human obesity. METHODS We identified 817 expressed TFs in RNA-Sequencing dataset representing extreme degrees of obesity (DO; lean, obese). We built a single Weighted Transcription Factor Co-expression Network (WTFCN) and TF sub-networks (based on the DO). Hub-TFs and MRs (using iRegulon) were identi-fied in biologically relevant WTFCNs modules. RESULTS Single WTFCN detected the Red module significantly associated with DO (P < 0.03). This module was enriched for regulation processes in the immune system, e.g.: Immune system process (Padj = 2.50E-06) and metabolic lifestyle disorders, e.g. Circadian rhythm - mammal pathway (Padj = 2.33E-11). Detected MR, hub-TF SPI1 was involved in obesity, immunity and osteoporosis. Within the obese sub-network, the Red module suggested possible associations with immunity, e.g. TGF-beta signaling pathway (Padj = 1.73E-02) and osteoporosis, e.g. Osteoclast differentiation (Padj = 1.94E-02). Within the lean sub-network, the Magenta module displayed associations with type 2 diabetes, obesity and os-teoporosis e.g. Notch signaling pathway (Padj = 2.40E-03), osteoporosis e.g. hub-TF VDR (a prime candidate gene for osteoporosis). CONCLUSION Our results provide insights into the regulatory network of TFs and biologically relevant hub TFs in obesity.
Collapse
Affiliation(s)
- Ruta Skinkyte-Juskiene
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870 Frederiksberg C, Denmark
| | - Lisette J.A. Kogelman
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870 Frederiksberg C, Denmark
- Danish Headache Center, Department of Neurology, Glostrup Research Institute, Rigshospitalet Glostrup, Nordre Ringvej 69, 2600 Glostrup, Denmark
| | - Haja N. Kadarmideen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 7, 1870 Frederiksberg C, Denmark
- Section of Systems Genomics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, Building 208, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
33
|
Kawalia SB, Raschka T, Naz M, de Matos Simoes R, Senger P, Hofmann-Apitius M. Analytical Strategy to Prioritize Alzheimer's Disease Candidate Genes in Gene Regulatory Networks Using Public Expression Data. J Alzheimers Dis 2018; 59:1237-1254. [PMID: 28800327 PMCID: PMC5611835 DOI: 10.3233/jad-170011] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Alzheimer’s disease (AD) progressively destroys cognitive abilities in the aging population with tremendous effects on memory. Despite recent progress in understanding the underlying mechanisms, high drug attrition rates have put a question mark behind our knowledge about its etiology. Re-evaluation of past studies could help us to elucidate molecular-level details of this disease. Several methods to infer such networks exist, but most of them do not elaborate on context specificity and completeness of the generated networks, missing out on lesser-known candidates. In this study, we present a novel strategy that corroborates common mechanistic patterns across large scale AD gene expression studies and further prioritizes potential biomarker candidates. To infer gene regulatory networks (GRNs), we applied an optimized version of the BC3Net algorithm, named BC3Net10, capable of deriving robust and coherent patterns. In principle, this approach initially leverages the power of literature knowledge to extract AD specific genes for generating viable networks. Our findings suggest that AD GRNs show significant enrichment for key signaling mechanisms involved in neurotransmission. Among the prioritized genes, well-known AD genes were prominent in synaptic transmission, implicated in cognitive deficits. Moreover, less intensive studied AD candidates (STX2, HLA-F, HLA-C, RAB11FIP4, ARAP3, AP2A2, ATP2B4, ITPR2, and ATP2A3) are also involved in neurotransmission, providing new insights into the underlying mechanism. To our knowledge, this is the first study to generate knowledge-instructed GRNs that demonstrates an effective way of combining literature-based knowledge and data-driven analysis to identify lesser known candidates embedded in stable and robust functional patterns across disparate datasets.
Collapse
Affiliation(s)
- Shweta Bagewadi Kawalia
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| | - Tamara Raschka
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,University of Applied Sciences Koblenz, RheinAhrCampus, Remagen, Germany
| | - Mufassra Naz
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| | | | - Philipp Senger
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, Germany.,Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn-Aachen International Center for Information Technology, Bonn, Germany
| |
Collapse
|
34
|
Shaddox E, Stingo FC, Peterson CB, Jacobson S, Cruickshank-Quinn C, Kechris K, Bowler R, Vannucci M. A Bayesian Approach for Learning Gene Networks Underlying Disease Severity in COPD. STATISTICS IN BIOSCIENCES 2018; 10:59-85. [PMID: 33912251 DOI: 10.1007/s12561-016-9176-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
In this paper, we propose a Bayesian hierarchical approach to infer network structures across multiple sample groups where both shared and differential edges may exist across the groups. In our approach, we link graphs through a Markov random field prior. This prior on network similarity provides a measure of pairwise relatedness that borrows strength only between related groups. We incorporate the computational efficiency of continuous shrinkage priors, improving scalability for network estimation in cases of larger dimensionality. Our model is applied to patient groups with increasing levels of chronic obstructive pulmonary disease severity, with the goal of better understanding the break down of gene pathways as the disease progresses. Our approach is able to identify critical hub genes for four targeted pathways. Furthermore, it identifies gene connections that are disrupted with increased disease severity and that characterize the disease evolution. We also demonstrate the superior performance of our approach with respect to competing methods, using simulated data.
Collapse
Affiliation(s)
- Elin Shaddox
- Department of Statistics, Rice University, Houston, USA
| | - Francesco C Stingo
- Dipartimento di Statistica, Informatica, Applicazioni "G.Parenti", University of Florence, Florence, Italy
| | | | - Sean Jacobson
- Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Charmion Cruickshank-Quinn
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Colorado Denver, Denver, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, Denver, CO, USA
| | - Russell Bowler
- Department of Medicine, National Jewish Health, Denver, CO, USA
| | | |
Collapse
|
35
|
Scutari M, Auconi P, Caldarelli G, Franchi L. Bayesian Networks Analysis of Malocclusion Data. Sci Rep 2017; 7:15236. [PMID: 29127377 PMCID: PMC5681542 DOI: 10.1038/s41598-017-15293-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 05/22/2017] [Indexed: 12/03/2022] Open
Abstract
In this paper we use Bayesian networks to determine and visualise the interactions among various Class III malocclusion maxillofacial features during growth and treatment. We start from a sample of 143 patients characterised through a series of a maximum of 21 different craniofacial features. We estimate a network model from these data and we test its consistency by verifying some commonly accepted hypotheses on the evolution of these disharmonies by means of Bayesian statistics. We show that untreated subjects develop different Class III craniofacial growth patterns as compared to patients submitted to orthodontic treatment with rapid maxillary expansion and facemask therapy. Among treated patients the CoA segment (the maxillary length) and the ANB angle (the antero-posterior relation of the maxilla to the mandible) seem to be the skeletal subspaces that receive the main effect of the treatment.
Collapse
Affiliation(s)
- Marco Scutari
- Department of Statistics, University of Oxford, 24-29 St Giles', Oxford, OX1 3LB, UK
| | | | - Guido Caldarelli
- IMT School for Advanced Studies, Piazza San Francesco 19, 55100, Lucca, Italy.
- Istituto dei Sistemi Complessi CNR, Unità Sapienza, Dip. Fisica, P.le A. Moro 2, 00185, Rome, Italy.
- London Institute for Mathematical Sciences, 35a South St, Mayfair, London, W1K 2XF, UK.
| | - Lorenzo Franchi
- Dipartimento di Chirurgia e Medicina Traslazionale, Università degli Studi di Firenze, Firenze, Italy
- Thomas M. Graber Visiting Scholar, Department of Orthodontics and Pediatric Dentistry, School of Dentistry, The University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
36
|
Formulation, construction and analysis of kinetic models of metabolism: A review of modelling frameworks. Biotechnol Adv 2017; 35:981-1003. [PMID: 28916392 DOI: 10.1016/j.biotechadv.2017.09.005] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 08/30/2017] [Accepted: 09/10/2017] [Indexed: 12/13/2022]
Abstract
Kinetic models are critical to predict the dynamic behaviour of metabolic networks. Mechanistic kinetic models for large networks remain uncommon due to the difficulty of fitting their parameters. Recent modelling frameworks promise new ways to overcome this obstacle while retaining predictive capabilities. In this review, we present an overview of the relevant mathematical frameworks for kinetic formulation, construction and analysis. Starting with kinetic formalisms, we next review statistical methods for parameter inference, as well as recent computational frameworks applied to the construction and analysis of kinetic models. Finally, we discuss opportunities and limitations hindering the development of larger kinetic reconstructions.
Collapse
|
37
|
|
38
|
Graph_sampler: a simple tool for fully Bayesian analyses of DAG-models. Comput Stat 2017. [DOI: 10.1007/s00180-017-0719-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
39
|
Lunagómez S, Mukherjee S, Wolpert RL, Airoldi EM. Geometric Representations of Random Hypergraphs. J Am Stat Assoc 2017. [DOI: 10.1080/01621459.2016.1141686] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
| | - Sayan Mukherjee
- Department of Statistical Science, Duke University, Durham, NC
| | | | | |
Collapse
|
40
|
Affiliation(s)
- Jack Kuipers
- Department of Biosystems Science and Engineering (D-BSSE) ETH Zurich, Basel, Switzerland
| | - Giusi Moffa
- Division of Psychiatry, University College London, London, United Kingdom
| |
Collapse
|
41
|
Kpogbezan GB, van der Vaart AW, van Wieringen WN, Leday GGR, van de Wiel MA. An empirical Bayes approach to network recovery using external knowledge. Biom J 2017; 59:932-947. [PMID: 28393396 DOI: 10.1002/bimj.201600090] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 11/22/2016] [Accepted: 12/04/2016] [Indexed: 11/12/2022]
Abstract
Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure that automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study, we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data.
Collapse
Affiliation(s)
- Gino B Kpogbezan
- Department of Mathematics, University of Leiden, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands
| | - Aad W van der Vaart
- Department of Mathematics, University of Leiden, Niels Bohrweg 1, 2333, CA Leiden, The Netherlands
| | - Wessel N van Wieringen
- Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081, HV Amsterdam, The Netherlands.,Department of Epidemiology and Biostatistics, VU University Medical Center, 1007, MB, Amsterdam, The Netherlands
| | - Gwenaël G R Leday
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Forvie Site, Cambridge, CB2 0SR, United Kingdom
| | - Mark A van de Wiel
- Department of Mathematics, Vrije Universiteit Amsterdam, De Boelelaan 1081, 1081, HV Amsterdam, The Netherlands.,Department of Epidemiology and Biostatistics, VU University Medical Center, 1007, MB, Amsterdam, The Netherlands
| |
Collapse
|
42
|
Fujii C, Kuwahara H, Yu G, Guo L, Gao X. Learning gene regulatory networks from gene expression data using weighted consensus. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.02.087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
43
|
Kannan V, Tegner J. Adaptive input data transformation for improved network reconstruction with information theoretic algorithms. Stat Appl Genet Mol Biol 2016; 15:507-520. [PMID: 27875324 DOI: 10.1515/sagmb-2016-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We propose a novel systematic procedure of non-linear data transformation for an adaptive algorithm in the context of network reverse-engineering using information theoretic methods. Our methodology is rooted in elucidating and correcting for the specific biases in the estimation techniques for mutual information (MI) given a finite sample of data. These are, in turn, tied to lack of well-defined bounds for numerical estimation of MI for continuous probability distributions from finite data. The nature and properties of the inevitable bias is described, complemented by several examples illustrating their form and variation. We propose an adaptive partitioning scheme for MI estimation that effectively transforms the sample data using parameters determined from its local and global distribution guaranteeing a more robust and reliable reconstruction algorithm. Together with a normalized measure (Shared Information Metric) we report considerably enhanced performance both for in silico and real-world biological networks. We also find that the recovery of true interactions is in particular better for intermediate range of false positive rates, suggesting that our algorithm is less vulnerable to spurious signals of association.
Collapse
|
44
|
Halasz M, Kholodenko BN, Kolch W, Santra T. Integrating network reconstruction with mechanistic modeling to predict cancer therapies. Sci Signal 2016; 9:ra114. [PMID: 27879396 DOI: 10.1126/scisignal.aae0535] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Signal transduction networks are often rewired in cancer cells. Identifying these alterations will enable more effective cancer treatment. We developed a computational framework that can identify, reconstruct, and mechanistically model these rewired networks from noisy and incomplete perturbation response data and then predict potential targets for intervention. As a proof of principle, we analyzed a perturbation data set targeting epidermal growth factor receptor (EGFR) and insulin-like growth factor 1 receptor (IGF1R) pathways in a panel of colorectal cancer cells. Our computational approach predicted cell line-specific network rewiring. In particular, feedback inhibition of insulin receptor substrate 1 (IRS1) by the kinase p70S6K was predicted to confer resistance to EGFR inhibition, suggesting that disrupting this feedback may restore sensitivity to EGFR inhibitors in colorectal cancer cells. We experimentally validated this prediction with colorectal cancer cell lines in culture and in a zebrafish (Danio rerio) xenograft model.
Collapse
Affiliation(s)
- Melinda Halasz
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland. .,School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| | - Boris N Kholodenko
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland.,School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland.,Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
| | - Walter Kolch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland. .,School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland.,Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
| | - Tapesh Santra
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
45
|
Liu H, Zhang F, Mishra SK, Zhou S, Zheng J. Knowledge-guided fuzzy logic modeling to infer cellular signaling networks from proteomic data. Sci Rep 2016; 6:35652. [PMID: 27774993 PMCID: PMC5075921 DOI: 10.1038/srep35652] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 09/29/2016] [Indexed: 12/14/2022] Open
Abstract
Modeling of signaling pathways is crucial for understanding and predicting cellular responses to drug treatments. However, canonical signaling pathways curated from literature are seldom context-specific and thus can hardly predict cell type-specific response to external perturbations; purely data-driven methods also have drawbacks such as limited biological interpretability. Therefore, hybrid methods that can integrate prior knowledge and real data for network inference are highly desirable. In this paper, we propose a knowledge-guided fuzzy logic network model to infer signaling pathways by exploiting both prior knowledge and time-series data. In particular, the dynamic time warping algorithm is employed to measure the goodness of fit between experimental and predicted data, so that our method can model temporally-ordered experimental observations. We evaluated the proposed method on a synthetic dataset and two real phosphoproteomic datasets. The experimental results demonstrate that our model can uncover drug-induced alterations in signaling pathways in cancer cells. Compared with existing hybrid models, our method can model feedback loops so that the dynamical mechanisms of signaling networks can be uncovered from time-series data. By calibrating generic models of signaling pathways against real data, our method supports precise predictions of context-specific anticancer drug effects, which is an important step towards precision medicine.
Collapse
Affiliation(s)
- Hui Liu
- Biomedical Informatics Lab, School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
- Lab of Information Management, Changzhou University, Jiangsu, 213164 China
| | - Fan Zhang
- Biomedical Informatics Lab, School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Shital Kumar Mishra
- Biomedical Informatics Lab, School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai 200433, China
| | - Jie Zheng
- Biomedical Informatics Lab, School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
- Genome Institute of Singapore (GIS), A*STAR, Biopolis, Singapore 138672, Singapore
| |
Collapse
|
46
|
Chasman D, Fotuhi Siahpirani A, Roy S. Network-based approaches for analysis of complex biological systems. Curr Opin Biotechnol 2016; 39:157-166. [PMID: 27115495 DOI: 10.1016/j.copbio.2016.04.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2015] [Revised: 04/04/2016] [Accepted: 04/05/2016] [Indexed: 12/22/2022]
Abstract
Cells function and respond to changes in their environment by the coordinated activity of their molecular components, including mRNAs, proteins and metabolites. At the heart of proper cellular function are molecular networks connecting these components to process extra-cellular environmental signals and drive dynamic, context-specific cellular responses. Network-based computational approaches aim to systematically integrate measurements from high-throughput experiments to gain a global understanding of cellular function under changing environmental conditions. We provide an overview of recent methodological developments toward solving two major computational problems within this field in the past two years (2013-2015): network reconstruction and network-based interpretation. Looking forward, we envision development of methods that can predict phenotypes with high accuracy as well as provide biologically plausible mechanistic hypotheses.
Collapse
Affiliation(s)
- Deborah Chasman
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, United States
| | - Alireza Fotuhi Siahpirani
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, United States; Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, United States; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, United States
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI 53715, United States; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53792, United States; Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, United States.
| |
Collapse
|
47
|
Dalton LA, Yousefi MR. Data Requirements for Model-Based Cancer Prognosis Prediction. Cancer Inform 2016; 14:123-38. [PMID: 27127404 PMCID: PMC4844301 DOI: 10.4137/cin.s30801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 02/02/2016] [Accepted: 02/07/2016] [Indexed: 11/20/2022] Open
Abstract
Cancer prognosis prediction is typically carried out without integrating scientific knowledge available on genomic pathways, the effect of drugs on cell dynamics, or modeling mutations in the population. Recent work addresses some of these problems by formulating an uncertainty class of Boolean regulatory models for abnormal gene regulation, assigning prognosis scores to each network based on intervention outcomes, and partitioning networks in the uncertainty class into prognosis classes based on these scores. For a new patient, the probability distribution of the prognosis class was evaluated using optimal Bayesian classification, given patient data. It was assumed that (1) disease is the result of several mutations of a known healthy network and that these mutations and their probability distribution in the population are known and (2) only a single snapshot of the patient's gene activity profile is observed. It was shown that, even in ideal settings where cancer in the population and the effect of a drug are fully modeled, a single static measurement is typically not sufficient. Here, we study what measurements are sufficient to predict prognosis. In particular, we relax assumption (1) by addressing how population data may be used to estimate network probabilities, and extend assumption (2) to include static and time-series measurements of both population and patient data. Furthermore, we extend the prediction of prognosis classes to optimal Bayesian regression of prognosis metrics. Even when time-series data is preferable to infer a stochastic dynamical network, we show that static data can be superior for prognosis prediction when constrained to small samples. Furthermore, although population data is helpful, performance is not sensitive to inaccuracies in the estimated network probabilities.
Collapse
Affiliation(s)
- Lori A. Dalton
- Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
48
|
Khoo BL, Chaudhuri PK, Ramalingam N, Tan DSW, Lim CT, Warkiani ME. Single-cell profiling approaches to probing tumor heterogeneity. Int J Cancer 2016; 139:243-55. [PMID: 26789729 DOI: 10.1002/ijc.30006] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 12/10/2015] [Accepted: 01/08/2016] [Indexed: 01/08/2023]
Abstract
Tumor heterogeneity is a major hindrance in cancer classification, diagnosis and treatment. Recent technological advances have begun to reveal the true extent of its heterogeneity. Single-cell analysis (SCA) is emerging as an important approach to detect variations in morphology, genetic or proteomic expression. In this review, we revisit the issue of inter- and intra-tumor heterogeneity, and list various modes of SCA techniques (cell-based, nucleic acid-based, protein-based, metabolite-based and lipid-based) presently used for cancer characterization. We further discuss the advantages of SCA over pooled cell analysis, as well as the limitations of conventional techniques. Emerging trends, such as high-throughput sequencing, are also mentioned as improved means for cancer profiling. Collectively, these applications have the potential for breakthroughs in cancer treatment.
Collapse
Affiliation(s)
- Bee Luan Khoo
- Mechanobiology Institute, National University of Singapore.,BioSystems and Micromechanics (BioSyM) IRG, Singapore-MIT Alliance for Research and Technology (SMART) Centre, Singapore
| | | | | | - Daniel Shao Weng Tan
- Division of Medical Oncology, National Cancer Centre Singapore.,Cancer Stem Cell Biology, Genome Institute of Singapore
| | - Chwee Teck Lim
- Mechanobiology Institute, National University of Singapore.,BioSystems and Micromechanics (BioSyM) IRG, Singapore-MIT Alliance for Research and Technology (SMART) Centre, Singapore.,Department of Biomedical Engineering, National University of Singapore
| | - Majid Ebrahimi Warkiani
- BioSystems and Micromechanics (BioSyM) IRG, Singapore-MIT Alliance for Research and Technology (SMART) Centre, Singapore.,School of Mechanical and Manufacturing Engineering, Australian Centre for NanoMedicine, University of New South Wales, Sydney, NSW, 2052, Australia
| |
Collapse
|
49
|
von der Heyde S, Sonntag J, Kramer F, Bender C, Korf U, Beißbarth T. Reconstruction of Protein Networks Using Reverse-Phase Protein Array Data. Methods Mol Biol 2016; 1362:227-246. [PMID: 26519181 DOI: 10.1007/978-1-4939-3106-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this chapter, we describe an approach to reconstruct cellular signaling networks based on measurements of protein activation after different stimulation experiments. As experimental platform reverse-phase protein arrays (RPPA) are used. RPPA allow the measurement of proteins and phosphoproteins across many samples in parallel with minimal sample consumption using a panel of highly target protein-specific antibodies. Functional interactions of proteins are modeled using a Boolean network. We describe the Boolean network reconstruction approach ddepn (dynamic deterministic effects propagation networks), which uses time course data to derive protein interactions based on perturbation experiments. We explain how the method works, give a practical application example, and describe how the results can be interpreted. Furthermore prior knowledge on signaling pathways is essential for network reconstruction. Here we describe the use of our software rBiopaxParser to integrate prior knowledge on protein signaling available in public databases. All applied methods are freely available as open-source R software packages. We describe the preparation of RPPA data as well as all relevant programming steps to format the RPPA data, to infer the prior knowledge, and to reconstruct and analyze the protein signaling networks.
Collapse
Affiliation(s)
- Silvia von der Heyde
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany.
- IndivuTest GmbH, Falkenried 88, 20251, Hamburg, Germany.
| | - Johanna Sonntag
- Division of Molecular Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Frank Kramer
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| | - Christian Bender
- TRON-Translational Oncology at the University Medical Center Mainz, Mainz, Germany
| | - Ulrike Korf
- Division of Molecular Genome Analysis, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tim Beißbarth
- Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
50
|
Abstract
Motivation: Markov networks are undirected graphical models that are widely used to infer relations between genes from experimental data. Their state-of-the-art inference procedures assume the data arise from a Gaussian distribution. High-throughput omics data, such as that from next generation sequencing, often violates this assumption. Furthermore, when collected data arise from multiple related but otherwise nonidentical distributions, their underlying networks are likely to have common features. New principled statistical approaches are needed that can deal with different data distributions and jointly consider collections of datasets. Results: We present FuseNet, a Markov network formulation that infers networks from a collection of nonidentically distributed datasets. Our approach is computationally efficient and general: given any number of distributions from an exponential family, FuseNet represents model parameters through shared latent factors that define neighborhoods of network nodes. In a simulation study, we demonstrate good predictive performance of FuseNet in comparison to several popular graphical models. We show its effectiveness in an application to breast cancer RNA-sequencing and somatic mutation data, a novel application of graphical models. Fusion of datasets offers substantial gains relative to inference of separate networks for each dataset. Our results demonstrate that network inference methods for non-Gaussian data can help in accurate modeling of the data generated by emergent high-throughput technologies. Availability and implementation: Source code is at https://github.com/marinkaz/fusenet. Contact:blaz.zupan@fri.uni-lj.si Supplementary information:Supplementary information is available at Bioinformatics online.
Collapse
Affiliation(s)
- Marinka Žitnik
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|