1
|
Sanz-Martín G, Migliore DP, Gómez del Campo P, del Castillo-Izquierdo J, Domínguez JM. GFPrint™: A machine learning tool for transforming genetic data into clinical insights. PLoS One 2024; 19:e0311370. [PMID: 39602407 PMCID: PMC11602062 DOI: 10.1371/journal.pone.0311370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 09/15/2024] [Indexed: 11/29/2024] Open
Abstract
The increasing availability of massive genetic sequencing data in the clinical setting has triggered the need for appropriate tools to help fully exploit the wealth of information these data possess. GFPrint™ is a proprietary streaming algorithm designed to meet that need. By extracting the most relevant functional features, GFPrint™ transforms high-dimensional, noisy genetic sequencing data into an embedded representation, allowing unsupervised models to create data clusters that can be re-mapped to the original clinical information. Ultimately, this allows the identification of genes and pathways relevant to disease onset and progression. GFPrint™ has been tested and validated using two cancer genomic datasets publicly available. Analysis of the TCGA dataset has identified panels of genes whose mutations appear to negatively influence survival in non-metastatic colorectal cancer (15 genes), epidermoid non-small cell lung cancer (167 genes) and pheochromocytoma (313 genes) patients. Likewise, analysis of the Broad Institute dataset has identified 75 genes involved in pathways related to extracellular matrix reorganization whose mutations appear to dictate a worse prognosis for breast cancer patients. GFPrint™ is accessible through a secure web portal and can be used in any therapeutic area where the genetic profile of patients influences disease evolution.
Collapse
|
2
|
Wu Q, Morrow EM, Gamsiz Uzun ED. A deep learning model for prediction of autism status using whole-exome sequencing data. PLoS Comput Biol 2024; 20:e1012468. [PMID: 39514604 DOI: 10.1371/journal.pcbi.1012468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 11/20/2024] [Accepted: 09/06/2024] [Indexed: 11/16/2024] Open
Abstract
Autism is a developmental disability. Research demonstrated that children with autism benefit from early diagnosis and early intervention. Genetic factors are considered major contributors to the development of autism. Machine learning (ML), including deep learning (DL), has been evaluated in phenotype prediction, but this method has been limited in its application to autism. We developed a DL model, the Separate Translated Autism Research Neural Network (STAR-NN) model to predict autism status. The model was trained and tested using whole exome sequencing data from 43,203 individuals (16,809 individuals with autism and 26,394 non-autistic controls). Polygenic scores from common variants and the aggregated count of rare variants on genes were used as input. In STAR-NN, protein truncating variants, possibly damaging missense variants and mild effect missense variants on the same gene were separated at the input level and merged to one gene node. In this way, rare variants with different level of pathogenic effects were treated separately. We further validated the performance of STAR-NN using an independent dataset, including 13,827 individuals with autism and 14,052 non-autistic controls. STAR-NN achieved a modest ROC-AUC of 0.7319 on the testing dataset and 0.7302 on the independent dataset. STAR-NN outperformed other traditional ML models. Gene Ontology analysis on the selected gene features showed an enrichment for potentially informative pathways including calcium ion transport.
Collapse
Affiliation(s)
- Qing Wu
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island, United States of America
- Center for Translational Neuroscience, Robert J. and Nancy D. Carney Institute for Brain Science and Brown Institute for Translational Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
| | - Eric M Morrow
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island, United States of America
- Center for Translational Neuroscience, Robert J. and Nancy D. Carney Institute for Brain Science and Brown Institute for Translational Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Developmental Disorders Genetics Research Program, Department of Psychiatry and Human Behavior, Emma Pendleton Bradley Hospital, East Providence, Rhode Island, United States of America
| | - Ece D Gamsiz Uzun
- Center for Translational Neuroscience, Robert J. and Nancy D. Carney Institute for Brain Science and Brown Institute for Translational Science, Brown University, Providence, Rhode Island, United States of America
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Pathology and Laboratory Medicine, Warren Alpert Medical School of Brown University, Providence, Rhode Island, United States of America
- Department of Pathology and Laboratory Medicine, Rhode Island Hospital, Providence, Rhode Island, United States of America
| |
Collapse
|
3
|
Syed AH, Abujabal HAS, Ahmad S, Malebary SJ, Alromema N. Advances in Inflammatory Bowel Disease Diagnostics: Machine Learning and Genomic Profiling Reveal Key Biomarkers for Early Detection. Diagnostics (Basel) 2024; 14:1182. [PMID: 38893707 PMCID: PMC11172026 DOI: 10.3390/diagnostics14111182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 05/25/2024] [Accepted: 06/01/2024] [Indexed: 06/21/2024] Open
Abstract
This study, utilizing high-throughput technologies and Machine Learning (ML), has identified gene biomarkers and molecular signatures in Inflammatory Bowel Disease (IBD). We could identify significant upregulated or downregulated genes in IBD patients by comparing gene expression levels in colonic specimens from 172 IBD patients and 22 healthy individuals using the GSE75214 microarray dataset. Our ML techniques and feature selection methods revealed six Differentially Expressed Gene (DEG) biomarkers (VWF, IL1RL1, DENND2B, MMP14, NAAA, and PANK1) with strong diagnostic potential for IBD. The Random Forest (RF) model demonstrated exceptional performance, with accuracy, F1-score, and AUC values exceeding 0.98. Our findings were rigorously validated with independent datasets (GSE36807 and GSE10616), further bolstering their credibility and showing favorable performance metrics (accuracy: 0.841, F1-score: 0.734, AUC: 0.887). Our functional annotation and pathway enrichment analysis provided insights into crucial pathways associated with these dysregulated genes. DENND2B and PANK1 were identified as novel IBD biomarkers, advancing our understanding of the disease. The validation in independent cohorts enhances the reliability of these findings and underscores their potential for early detection and personalized treatment of IBD. Further exploration of these genes is necessary to fully comprehend their roles in IBD pathogenesis and develop improved diagnostic tools and therapies. This study significantly contributes to IBD research with valuable insights, potentially greatly enhancing patient care.
Collapse
Affiliation(s)
- Asif Hassan Syed
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah 22254, Saudi Arabia;
| | - Hamza Ali S. Abujabal
- Department of Mathematics, Faculty of Science, King Abdulaziz University, P.O. Box 80203, Jeddah 21589, Saudi Arabia;
| | - Shakeel Ahmad
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, Jeddah 22254, Saudi Arabia;
| | - Sharaf J. Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia;
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia;
| |
Collapse
|
4
|
The Critical Assessment of Genome Interpretation Consortium, Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, et alThe Critical Assessment of Genome Interpretation Consortium, Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Show More Authors] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
5
|
Barnett EJ, Onete DG, Salekin A, Faraone SV. Genomic Machine Learning Meta-regression: Insights on Associations of Study Features With Reported Model Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:169-177. [PMID: 38109236 DOI: 10.1109/tcbb.2023.3343808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Many studies have been conducted with the goal of correctly predicting diagnostic status of a disorder using the combination of genomic data and machine learning. It is often hard to judge which components of a study led to better results and whether better reported results represent a true improvement or an uncorrected bias inflating performance. We extracted information about the methods used and other differentiating features in genomic machine learning models. We used these features in linear regressions predicting model performance. We tested for univariate and multivariate associations as well as interactions between features. Of the models reviewed, 46% used feature selection methods that can lead to data leakage. Across our models, the number of hyperparameter optimizations reported, data leakage due to feature selection, model type, and modeling an autoimmune disorder were significantly associated with an increase in reported model performance. We found a significant, negative interaction between data leakage and training size. Our results suggest that methods susceptible to data leakage are prevalent among genomic machine learning research, resulting in inflated reported performance. Best practice guidelines that promote the avoidance and recognition of data leakage may help the field avoid biased results.
Collapse
|
6
|
Pinton P. Impact of artificial intelligence on prognosis, shared decision-making, and precision medicine for patients with inflammatory bowel disease: a perspective and expert opinion. Ann Med 2024; 55:2300670. [PMID: 38163336 PMCID: PMC10763920 DOI: 10.1080/07853890.2023.2300670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/27/2023] [Indexed: 01/03/2024] Open
Abstract
INTRODUCTION Artificial intelligence (AI) is expected to impact all facets of inflammatory bowel disease (IBD) management, including disease assessment, treatment decisions, discovery and development of new biomarkers and therapeutics, as well as clinician-patient communication. AREAS COVERED This perspective paper provides an overview of the application of AI in the clinical management of IBD through a review of the currently available AI models that could be potential tools for prognosis, shared decision-making, and precision medicine. This overview covers models that measure treatment response based on statistical or machine-learning methods, or a combination of the two. We briefly discuss a computational model that allows integration of immune/biological system knowledge with mathematical modeling and also involves a 'digital twin', which allows measurement of temporal trends in mucosal inflammatory activity for predicting treatment response. A viewpoint on AI-enabled wearables and nearables and their use to improve IBD management is also included. EXPERT OPINION Although challenges regarding data quality, privacy, and security; ethical concerns; technical limitations; and regulatory barriers remain to be fully addressed, a growing body of evidence suggests a tremendous potential for integration of AI into daily clinical practice to enable precision medicine and shared decision-making.
Collapse
Affiliation(s)
- Philippe Pinton
- Clinical and Translational Sciences, Ferring Pharmaceuticals, Kastrup, Denmark
| |
Collapse
|
7
|
Stafford IS, Ashton JJ, Mossotto E, Cheng G, Mark Beattie R, Ennis S. Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data. J Crohns Colitis 2023; 17:1672-1680. [PMID: 37205778 PMCID: PMC10637043 DOI: 10.1093/ecco-jcc/jjad084] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Indexed: 05/21/2023]
Abstract
BACKGROUND Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype. METHODS Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset. RESULTS A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC. DISCUSSION We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.
Collapse
Affiliation(s)
- Imogen S Stafford
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- NIHR Southampton Biomedical Research, University Hospital Southampton, Southampton, UK
- Institute for Life Sciences, University of Southampton, Southampton, UK
| | - James J Ashton
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- Department of Paediatric Gastroenterology, Southampton Children’s Hospital, Southampton, UK
| | - Enrico Mossotto
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - Guo Cheng
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- NIHR Southampton Biomedical Research, University Hospital Southampton, Southampton, UK
| | - Robert Mark Beattie
- Department of Paediatric Gastroenterology, Southampton Children’s Hospital, Southampton, UK
| | - Sarah Ennis
- Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| |
Collapse
|
8
|
Verplaetse N, Passemiers A, Arany A, Moreau Y, Raimondi D. Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease. Genome Biol 2023; 24:224. [PMID: 37798735 PMCID: PMC10552306 DOI: 10.1186/s13059-023-03064-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 09/20/2023] [Indexed: 10/07/2023] Open
Abstract
BACKGROUND Despite clear evidence of nonlinear interactions in the molecular architecture of polygenic diseases, linear models have so far appeared optimal in genotype-to-phenotype modeling. A key bottleneck for such modeling is that genetic data intrinsically suffers from underdetermination ([Formula: see text]). Millions of variants are present in each individual while the collection of large, homogeneous cohorts is hindered by phenotype incidence, sequencing cost, and batch effects. RESULTS We demonstrate that when we provide enough training data and control the complexity of nonlinear models, a neural network outperforms additive approaches in whole exome sequencing-based inflammatory bowel disease case-control prediction. To do so, we propose a biologically meaningful sparsified neural network architecture, providing empirical evidence for positive and negative epistatic effects present in the inflammatory bowel disease pathogenesis. CONCLUSIONS In this paper, we show that underdetermination is likely a major driver for the apparent optimality of additive modeling in clinical genetics today.
Collapse
Affiliation(s)
- Nora Verplaetse
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| | - Antoine Passemiers
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Adam Arany
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Yves Moreau
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Daniele Raimondi
- Department of of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium.
| |
Collapse
|
9
|
Raimondi D, Orlando G, Verplaetse N, Fariselli P, Moreau Y. Editorial: Towards genome interpretation: Computational methods to model the genotype-phenotype relationship. FRONTIERS IN BIOINFORMATICS 2022; 2:1098941. [PMID: 36530385 PMCID: PMC9749061 DOI: 10.3389/fbinf.2022.1098941] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 11/17/2022] [Indexed: 11/12/2023] Open
Affiliation(s)
| | | | | | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | | |
Collapse
|
10
|
Li Y, Law HKW. Deciphering the role of autophagy in the immunopathogenesis of inflammatory bowel disease. Front Pharmacol 2022; 13:1070184. [DOI: 10.3389/fphar.2022.1070184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 10/31/2022] [Indexed: 11/16/2022] Open
Abstract
Inflammatory bowel disease (IBD) is a typical immune-mediated chronic inflammatory disorder. Following the industrialization and changes in lifestyle, the incidence of IBD in the world is rising, which makes health concerns and heavy burdens all over the world. However, the pathogenesis of IBD remains unclear, and the current understanding of the pathogenesis involves dysregulation of mucosal immunity, gut microbiome dysbiosis, and gut barrier defect based on genetic susceptibility and environmental triggers. In recent years, autophagy has emerged as a key mechanism in IBD development and progression because Genome-Wide Association Study revealed the complex interactions of autophagy in IBD, especially immunopathogenesis. Besides, autophagy markers are also suggested to be potential biomarkers and target treatment in IBD. This review summarizes the autophagy-related genes regulating immune response in IBD. Furthermore, we explore the evolving evidence that autophagy interacts with intestinal epithelial and immune cells to contribute to the inflammatory changes in IBD. Finally, we discuss how novel discovery could further advance our understanding of the role of autophagy and inform novel therapeutic strategies in IBD.
Collapse
|
11
|
Stafford IS, Gosink MM, Mossotto E, Ennis S, Hauben M. A Systematic Review of Artificial Intelligence and Machine Learning Applications to Inflammatory Bowel Disease, with Practical Guidelines for Interpretation. Inflamm Bowel Dis 2022; 28:1573-1583. [PMID: 35699597 PMCID: PMC9527612 DOI: 10.1093/ibd/izac115] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Indexed: 12/15/2022]
Abstract
BACKGROUND Inflammatory bowel disease (IBD) is a gastrointestinal chronic disease with an unpredictable disease course. Computational methods such as machine learning (ML) have the potential to stratify IBD patients for the provision of individualized care. The use of ML methods for IBD was surveyed, with an additional focus on how the field has changed over time. METHODS On May 6, 2021, a systematic review was conducted through a search of MEDLINE and Embase databases, with the search structure ("machine learning" OR "artificial intelligence") AND ("Crohn* Disease" OR "Ulcerative Colitis" OR "Inflammatory Bowel Disease"). Exclusion criteria included studies not written in English, no human patient data, publication before 2001, studies that were not peer reviewed, nonautoimmune disease comorbidity research, and record types that were not primary research. RESULTS Seventy-eight (of 409) records met the inclusion criteria. Random forest methods were most prevalent, and there was an increase in neural networks, mainly applied to imaging data sets. The main applications of ML to clinical tasks were diagnosis (18 of 78), disease course (22 of 78), and disease severity (16 of 78). The median sample size was 263. Clinical and microbiome-related data sets were most popular. Five percent of studies used an external data set after training and testing for additional model validation. DISCUSSION Availability of longitudinal and deep phenotyping data could lead to better modeling. Machine learning pipelines that consider imbalanced data and that feature selection only on training data will generate more generalizable models. Machine learning models are increasingly being applied to more complex clinical tasks for specific phenotypes, indicating progress towards personalized medicine for IBD.
Collapse
Affiliation(s)
- Imogen S Stafford
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
- Institute for Life Sciences, University Of Southampton, Southampton, UK
- NIHR Southampton Biomedical Research, University HospitalSouthampton, Southampton, UK
| | | | - Enrico Mossotto
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - Sarah Ennis
- Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK
| | - Manfred Hauben
- Pfizer Inc, New York, NY, USA
- NYU Langone Health, Department of Medicine, New York, NY, USA
| |
Collapse
|
12
|
Sun S, Miller M, Wang Y, Tyc KM, Cao X, Scott RT, Tao X, Bromberg Y, Schindler K, Xing J. Predicting embryonic aneuploidy rate in IVF patients using whole-exome sequencing. Hum Genet 2022; 141:1615-1627. [PMID: 35347416 PMCID: PMC10095970 DOI: 10.1007/s00439-022-02450-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 03/16/2022] [Indexed: 01/13/2023]
Abstract
Infertility is a major reproductive health issue that affects about 12% of women of reproductive age in the United States. Aneuploidy in eggs accounts for a significant proportion of early miscarriage and in vitro fertilization failure. Recent studies have shown that genetic variants in several genes affect chromosome segregation fidelity and predispose women to a higher incidence of egg aneuploidy. However, the exact genetic causes of aneuploid egg production remain unclear, making it difficult to diagnose infertility based on individual genetic variants in mother's genome. In this study, we evaluated machine learning-based classifiers for predicting the embryonic aneuploidy risk in female IVF patients using whole-exome sequencing data. Using two exome datasets, we obtained an area under the receiver operating curve of 0.77 and 0.68, respectively. High precision could be traded off for high specificity in classifying patients by selecting different prediction score cutoffs. For example, a strict prediction score cutoff of 0.7 identified 29% of patients as high-risk with 94% precision. In addition, we identified MCM5, FGGY, and DDX60L as potential aneuploidy risk genes that contribute the most to the predictive power of the model. These candidate genes and their molecular interaction partners are enriched for meiotic-related gene ontology categories and pathways, such as microtubule organizing center and DNA recombination. In summary, we demonstrate that sequencing data can be mined to predict patients' aneuploidy risk thus improving clinical diagnosis. The candidate genes and pathways we identified are promising targets for future aneuploidy studies.
Collapse
Affiliation(s)
- Siqi Sun
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
| | - Katarzyna M Tyc
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Current address: Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, VA, USA
| | - Xiaolong Cao
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Richard T Scott
- Reproductive Medicine Associates of New Jersey, Basking Ridge, NJ, USA
| | - Xin Tao
- Foundation for Embryonic Competence, Basking Ridge, NJ, USA
| | - Yana Bromberg
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, USA
- Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Karen Schindler
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
- Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Jinchuan Xing
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA.
- Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
13
|
Computational interpretation of human genetic variation. Hum Genet 2022; 141:1545-1548. [PMID: 36149496 DOI: 10.1007/s00439-022-02483-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
14
|
Multivariate genome-wide association study models to improve prediction of Crohn’s disease risk and identification of potential novel variants. Comput Biol Med 2022; 145:105398. [DOI: 10.1016/j.compbiomed.2022.105398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 12/21/2022]
|
15
|
Vadapalli S, Abdelhalim H, Zeeshan S, Ahmed Z. Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Brief Bioinform 2022; 23:6590150. [PMID: 35595537 DOI: 10.1093/bib/bbac191] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/02/2022] [Accepted: 04/26/2022] [Indexed: 12/16/2022] Open
Abstract
Precision medicine uses genetic, environmental and lifestyle factors to more accurately diagnose and treat disease in specific groups of patients, and it is considered one of the most promising medical efforts of our time. The use of genetics is arguably the most data-rich and complex components of precision medicine. The grand challenge today is the successful assimilation of genetics into precision medicine that translates across different ancestries, diverse diseases and other distinct populations, which will require clever use of artificial intelligence (AI) and machine learning (ML) methods. Our goal here was to review and compare scientific objectives, methodologies, datasets, data sources, ethics and gaps of AI/ML approaches used in genomics and precision medicine. We selected high-quality literature published within the last 5 years that were indexed and available through PubMed Central. Our scope was narrowed to articles that reported application of AI/ML algorithms for statistical and predictive analyses using whole genome and/or whole exome sequencing for gene variants, and RNA-seq and microarrays for gene expression. We did not limit our search to specific diseases or data sources. Based on the scope of our review and comparative analysis criteria, we identified 32 different AI/ML approaches applied in variable genomics studies and report widely adapted AI/ML algorithms for predictive diagnostics across several diseases.
Collapse
Affiliation(s)
- Sreya Vadapalli
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Habiba Abdelhalim
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Rutgers Cancer Institute of New Jersey, Rutgers University, 195 Little Albany St, New Brunswick, NJ, USA
| | - Zeeshan Ahmed
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson St, New Brunswick, NJ, USA
| |
Collapse
|
16
|
Machine Learning Modeling from Omics Data as Prospective Tool for Improvement of Inflammatory Bowel Disease Diagnosis and Clinical Classifications. Genes (Basel) 2021; 12:genes12091438. [PMID: 34573420 PMCID: PMC8466305 DOI: 10.3390/genes12091438] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 08/21/2021] [Accepted: 09/14/2021] [Indexed: 12/14/2022] Open
Abstract
Research of inflammatory bowel disease (IBD) has identified numerous molecular players involved in the disease development. Even so, the understanding of IBD is incomplete, while disease treatment is still far from the precision medicine. Reliable diagnostic and prognostic biomarkers in IBD are limited which may reduce efficient therapeutic outcomes. High-throughput technologies and artificial intelligence emerged as powerful tools in search of unrevealed molecular patterns that could give important insights into IBD pathogenesis and help to address unmet clinical needs. Machine learning, a subtype of artificial intelligence, uses complex mathematical algorithms to learn from existing data in order to predict future outcomes. The scientific community has been increasingly employing machine learning for the prediction of IBD outcomes from comprehensive patient data-clinical records, genomic, transcriptomic, proteomic, metagenomic, and other IBD relevant omics data. This review aims to present fundamental principles behind machine learning modeling and its current application in IBD research with the focus on studies that explored genomic and transcriptomic data. We described different strategies used for dealing with omics data and outlined the best-performing methods. Before being translated into clinical settings, the developed machine learning models should be tested in independent prospective studies as well as randomized controlled trials.
Collapse
|
17
|
Gubatan J, Levitte S, Patel A, Balabanis T, Wei MT, Sinha SR. Artificial intelligence applications in inflammatory bowel disease: Emerging technologies and future directions. World J Gastroenterol 2021; 27:1920-1935. [PMID: 34007130 PMCID: PMC8108036 DOI: 10.3748/wjg.v27.i17.1920] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/04/2021] [Accepted: 04/13/2021] [Indexed: 02/06/2023] Open
Abstract
Inflammatory bowel disease (IBD) is a complex and multifaceted disorder of the gastrointestinal tract that is increasing in incidence worldwide and associated with significant morbidity. The rapid accumulation of large datasets from electronic health records, high-definition multi-omics (including genomics, proteomics, transcriptomics, and metagenomics), and imaging modalities (endoscopy and endomicroscopy) have provided powerful tools to unravel novel mechanistic insights and help address unmet clinical needs in IBD. Although the application of artificial intelligence (AI) methods has facilitated the analysis, integration, and interpretation of large datasets in IBD, significant heterogeneity in AI methods, datasets, and clinical outcomes and the need for unbiased prospective validations studies are current barriers to incorporation of AI into clinical practice. The purpose of this review is to summarize the most recent advances in the application of AI and machine learning technologies in the diagnosis and risk prediction, assessment of disease severity, and prediction of clinical outcomes in patients with IBD.
Collapse
Affiliation(s)
- John Gubatan
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Steven Levitte
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Akshar Patel
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Tatiana Balabanis
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Mike T Wei
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| | - Sidhartha R Sinha
- Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Redwood City, CA 94063, United States
| |
Collapse
|
18
|
Raimondi D, Simm J, Arany A, Fariselli P, Cleynen I, Moreau Y. An interpretable low-complexity machine learning framework for robust exome-based in- silico diagnosis of Crohn's disease patients. NAR Genom Bioinform 2021; 2:lqaa011. [PMID: 33575557 PMCID: PMC7671306 DOI: 10.1093/nargab/lqaa011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 01/22/2020] [Accepted: 02/05/2020] [Indexed: 02/06/2023] Open
Abstract
Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.
Collapse
Affiliation(s)
| | - Jaak Simm
- ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium
| | - Adam Arany
- ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, 10123 Italy
| | | | - Yves Moreau
- ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium
| |
Collapse
|
19
|
Zhu C, Miller M, Zeng Z, Wang Y, Mahlich Y, Aptekmann A, Bromberg Y. Computational Approaches for Unraveling the Effects of Variation in the Human Genome and Microbiome. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-030320-041014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The past two decades of analytical efforts have highlighted how much more remains to be learned about the human genome and, particularly, its complex involvement in promoting disease development and progression. While numerous computational tools exist for the assessment of the functional and pathogenic effects of genome variants, their precision is far from satisfactory, particularly for clinical use. Accumulating evidence also suggests that the human microbiome's interaction with the human genome plays a critical role in determining health and disease states. While numerous microbial taxonomic groups and molecular functions of the human microbiome have been associated with disease, the reproducibility of these findings is lacking. The human microbiome–genome interaction in healthy individuals is even less well understood. This review summarizes the available computational methods built to analyze the effect of variation in the human genome and microbiome. We address the applicability and precision of these methods across their possible uses. We also briefly discuss the exciting, necessary, and now possible integration of the two types of data to improve the understanding of pathogenicity mechanisms.
Collapse
Affiliation(s)
- Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Ariel Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA
| |
Collapse
|
20
|
Iourov IY, Vorsanova SG, Yurov YB. The variome concept: focus on CNVariome. Mol Cytogenet 2019; 12:52. [PMID: 31890032 PMCID: PMC6924070 DOI: 10.1186/s13039-019-0467-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 12/13/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Variome may be used for designating complex system of interplay between genomic variations specific for an individual or a disease. Despite the recognized complexity of genomic basis for phenotypic traits and diseases, studies of genetic causes of a disease are usually dedicated to the identification of single causative genomic changes (mutations). When such an artificially simplified model is employed, genomic basis of phenotypic outcomes remains elusive in the overwhelming majority of human diseases. Moreover, it is repeatedly demonstrated that multiple genomic changes within an individual genome are likely to underlie the phenome. Probably the best example of cumulative effect of variome on the phenotype is CNV (copy number variation) burden. Accordingly, we have proposed a variome concept based on CNV studies providing the evidence for the existence of a CNVariome (the set of CNV affecting an individual genome), a target for genomic analyses useful for unraveling genetic mechanisms of diseases and phenotypic traits. CONCLUSION Variome (CNVariome) concept suggests that a genomic milieu is determined by the whole set of genomic variations (CNV) within an individual genome. The genomic milieu is likely to result from interplay between these variations. Furthermore, such kind of variome may be either individual or disease-specific. Additionally, such variome may be pathway-specific. The latter is able to affect molecular/cellular pathways of genome stability maintenance leading to occurrence of genomic/chromosome instability and/or somatic mosaicism resulting in somatic variome. This variome type seems to be important for unraveling disease mechanisms, as well. Finally, it appears that bioinformatic analysis of both individual and somatic variomes in the context of diseases- and pathway-specific variomes is the most promising way to determine genomic basis of the phenome and to unravel disease mechanisms for the management and treatment of currently incurable diseases.
Collapse
Affiliation(s)
- Ivan Y. Iourov
- Yurov’s Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Svetlana G. Vorsanova
- Yurov’s Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| | - Yuri B. Yurov
- Yurov’s Laboratory of Molecular Genetics and Cytogenomics of the Brain, Mental Health Research Center, 117152 Moscow, Russia
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Ministry of Health of Russian Federation, 125412 Moscow, Russia
| |
Collapse
|