1
|
Tafazoli A, Hemmati M, Rafigh M, Alimardani M, Khaghani F, Korostyński M, Karnes JH. Leveraging long-read sequencing technologies for pharmacogenomic testing: applications, analytical strategies, challenges, and future perspectives. Front Genet 2025; 16:1435416. [PMID: 40370700 PMCID: PMC12075302 DOI: 10.3389/fgene.2025.1435416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 04/07/2025] [Indexed: 05/16/2025] Open
Abstract
Long-read sequencing (LRS) was introduced as the third generation of next-generation sequencing technologies with a high accuracy rate in genomic variant identification for some of its platforms. Due to the structural complexity of many pharmacogenes, the presence of rare variants, and the limitations of genotyping and short-read sequencing approaches in detecting pharmacovariants, LRS methods are likely to become increasingly utilized in the near future. In this review, we aim to provide a comprehensive discussion of current and future applications of long-read genotyping methods by introducing the opportunities and advantages as well as the challenges and disadvantages of state-of-the-art LRS platforms for the implementation of pharmacogenomic tests in clinical and research settings. New approaches to data processing, as well as the challenges and pitfalls of performing such tests in daily practice, will be explored in detail. We provide references to resources for those who are interested or intend to employ LRS in pharmacogenomics screening, both in clinical and research settings.
Collapse
Affiliation(s)
- Alireza Tafazoli
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada
| | - Mahboobeh Hemmati
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahboobeh Rafigh
- Medical Genetics Research Center, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Maliheh Alimardani
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Faeze Khaghani
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Guilan University of Medical Sciences, Rasht, Iran
| | - Michał Korostyński
- Laboratory of Pharmacogenomics, Department of Molecular Neuropharmacology, Maj Institute of Pharmacology Polish Academy of Sciences, Kraków, Poland
| | - Jason H. Karnes
- Department of Pharmacy Practice and Science, University of Arizona R. Ken Coit College of Pharmacy, Tucson, AZ, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
2
|
Fang H, Eacker SM, Wu Y, Paschal C, Wood M, Nelson B, Muratov A, Liu Y. Evaluation of Genomic Proximity Mapping (GPM) for Detecting Genomic and Chromosomal Structural Variants in Constitutional Disorders. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.04.23.25326303. [PMID: 40313283 PMCID: PMC12045419 DOI: 10.1101/2025.04.23.25326303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2025]
Abstract
Genomic structural variants (SVs) are critical contributors to genetic diversity and disease, yet their detection remains challenging with conventional cytogenetic techniques, such as karyotyping, fluorescence in situ hybridization (FISH), and chromosome microarray analysis (CMA). These methods often lack the resolution and sensitivity needed for comprehensive characterization of chromosomal aberrations. To address these limitations, we implemented genomic proximity mapping (GPM), a genome-wide chromosome conformation capture technology, in a clinical setting. In this study, we applied GPM to a cohort of 123 patients with constitutional disorders, achieving a 100% concordance rate in detecting 411 CNVs and 39 structural rearrangements, in addition to novel findings missed by standard methods. GPM demonstrated unique advantages, such as resolving both balanced and unbalanced chromosomal rearrangements with precise (<5kb) breakpoint resolution, maintaining robust performance with challenging samples, including formalin-fixed, paraffin-embedded (FFPE) tissues, and detecting mosaicism with high sensitivity. Furthermore, GPM reliably provided detailed copy number and loss-of-heterozygosity profiles, streamlining workflows and enhancing diagnostic resolution. GPM represents a transformative tool for genomic diagnostics, offering a high-resolution, comprehensive approach to detecting diverse genomic alterations. Its ability to address limitations of conventional cytogenetics methods positions GPM as a needed advance in the diagnosis, prognosis, and therapeutic management of genetic disorders.
Collapse
|
3
|
Joo JE, Viana-Errasti J, Buchanan DD, Valle L. Genetics, genomics and clinical features of adenomatous polyposis. Fam Cancer 2025; 24:38. [PMID: 40237887 PMCID: PMC12003455 DOI: 10.1007/s10689-025-00460-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2025] [Accepted: 03/16/2025] [Indexed: 04/18/2025]
Abstract
Adenomatous polyposis syndromes are hereditary conditions characterised by the development of multiple adenomas in the gastrointestinal tract, particularly in the colon and rectum, significantly increasing the risk of colorectal cancer and, in some cases, extra-colonic malignancies. These syndromes are caused by germline pathogenic variants (PVs) in genes involved in Wnt signalling and DNA repair. The main autosomal dominant adenomatous polyposis syndromes include familial adenomatous polyposis (FAP) and polymerase proofreading-associated polyposis (PPAP), caused by germline PVs in APC and the POLE and POLD1 genes, respectively. Autosomal recessive syndromes include those caused by biallelic PVs in the DNA mismatch repair genes MLH1, MSH2, MSH6, PMS2, MSH3 and probably MLH3, and in the base excision repair genes MUTYH, NTHL1 and MBD4. This review provides an in-depth discussion of the genetic and molecular mechanisms underlying hereditary adenomatous polyposis syndromes, their clinical presentations, tumour mutational signatures, and emerging approaches for the treatment of the associated cancers. Considerations for genetic testing are described, including post-zygotic mosaicism, non-coding PVs, the interpretation of variants of unknown significance and cancer risks associated with monoallelic variants in the recessive genes. Despite advances in genetic testing and the recent identification of new adenomatous polyposis genes, many cases of multiple adenomas remain genetically unexplained. Non-genetic factors, including environmental risk factors, prior oncologic treatments, and bacterial genotoxins colonising the intestine - particularly colibactin-producing Escherichia coli - have emerged as alternative pathogenic mechanisms.
Collapse
Affiliation(s)
- Jihoon E Joo
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, VIC, Australia
- Collaborative Centre for Genomic Cancer Medicine, Victorian Comprehensive Cancer Centre, Parkville, VIC, Australia
| | - Julen Viana-Errasti
- Hereditary Cancer Program, Catalan Institute of Oncology, IDIBELL, Hospitalet de Llobregat, Av. Gran Via 199- 203, Hospitalet de Llobregat, 08908, Spain
- Program in Molecular Mechanisms and Experimental Therapy in Oncology (Oncobell), IDIBELL, Hospitalet de Llobregat, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), Madrid, Spain
- Doctoral Program in Biomedicine, University of Barcelona, Hospitalet de Llobregat, Barcelona, Spain
| | - Daniel D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, VIC, Australia.
- Collaborative Centre for Genomic Cancer Medicine, Victorian Comprehensive Cancer Centre, Parkville, VIC, Australia.
- Genomic Medicine and Family Cancer Clinic, Royal Melbourne Hospital, Parkville, VIC, Australia.
| | - Laura Valle
- Hereditary Cancer Program, Catalan Institute of Oncology, IDIBELL, Hospitalet de Llobregat, Av. Gran Via 199- 203, Hospitalet de Llobregat, 08908, Spain.
- Program in Molecular Mechanisms and Experimental Therapy in Oncology (Oncobell), IDIBELL, Hospitalet de Llobregat, Barcelona, Spain.
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), Madrid, Spain.
| |
Collapse
|
4
|
Li Q, Keskus AG, Wagner J, Izydorczyk MB, Timp W, Sedlazeck FJ, Klein AP, Zook JM, Kolmogorov M, Schatz MC. Unraveling the hidden complexity of cancer through long-read sequencing. Genome Res 2025; 35:599-620. [PMID: 40113261 PMCID: PMC12047254 DOI: 10.1101/gr.280041.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
Cancer is fundamentally a disease of the genome, characterized by extensive genomic, transcriptomic, and epigenomic alterations. Most current studies predominantly use short-read sequencing, gene panels, or microarrays to explore these alterations; however, these technologies can systematically miss or misrepresent certain types of alterations, especially structural variants, complex rearrangements, and alterations within repetitive regions. Long-read sequencing is rapidly emerging as a transformative technology for cancer research by providing a comprehensive view across the genome, transcriptome, and epigenome, including the ability to detect alterations that previous technologies have overlooked. In this Perspective, we explore the current applications of long-read sequencing for both germline and somatic cancer analysis. We provide an overview of the computational methodologies tailored to long-read data and highlight key discoveries and resources within cancer genomics that were previously inaccessible with prior technologies. We also address future opportunities and persistent challenges, including the experimental and computational requirements needed to scale to larger sample sizes, the hurdles in sequencing and analyzing complex cancer genomes, and opportunities for leveraging machine learning and artificial intelligence technologies for cancer informatics. We further discuss how the telomere-to-telomere genome and the emerging human pangenome could enhance the resolution of cancer genome analysis, potentially revolutionizing early detection and disease monitoring in patients. Finally, we outline strategies for transitioning long-read sequencing from research applications to routine clinical practice.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Ayse G Keskus
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Michal B Izydorczyk
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77251, USA
| | - Alison P Klein
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, Bethesda, Maryland 20892, USA;
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins Medicine, Baltimore, Maryland 21031, USA
| |
Collapse
|
5
|
Mahmoud M, Agustinho DP, Sedlazeck FJ. A Hitchhiker's Guide to long-read genomic analysis. Genome Res 2025; 35:545-558. [PMID: 40228901 PMCID: PMC12047252 DOI: 10.1101/gr.279975.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering the hidden and complex regions of the genome. Significant cost efficiency, scalability, and accuracy advancements have driven this evolution. Concurrently, novel analytical methods have emerged to harness the full potential of long reads. These advancements have enabled milestones such as the first fully completed human genome, enhanced identification and understanding of complex genomic variants, and deeper insights into the interplay between epigenetics and genomic variation. This mini-review provides a comprehensive overview of the latest developments in long-read DNA sequencing analysis, encompassing reference-based and de novo assembly approaches. We explore the entire workflow, from initial data processing to variant calling and annotation, focusing on how these methods improve our ability to interpret a wide array of genomic variants. Additionally, we discuss the current challenges, limitations, and future directions in the field, offering a detailed examination of the state-of-the-art bioinformatics methods for long-read sequencing.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Daniel P Agustinho
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
6
|
Baumann AA, Knol LI, Arlt M, Hutschenreiter T, Richter A, Widmann TJ, Franke M, Hackmann K, Winkler S, Richter D, Spier I, Aretz S, Aust D, Porrmann J, William D, Schröck E, Glimm H, Jahn A. Long-read genome and RNA sequencing resolve a pathogenic intronic germline LINE-1 insertion in APC. NPJ Genom Med 2025; 10:30. [PMID: 40180948 PMCID: PMC11968988 DOI: 10.1038/s41525-025-00485-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 02/28/2025] [Indexed: 04/05/2025] Open
Abstract
Familial adenomatous polyposis (FAP) is caused by pathogenic germline variants in the tumor suppressor gene APC. Confirmation of diagnosis was not achieved by cancer gene panel and exome sequencing or custom array-CGH in a family with suspected FAP across five generations. Long-read genome sequencing (PacBio), short-read genome sequencing (Illumina), short-read RNA sequencing, and further validations were performed in different tissues of multiple family members. Long-read genome sequencing resolved a 6 kb full-length intronic insertion of a heterozygous LINE-1 element between exons 7 and 8 of APC that could be detected but not fully resolved by short-read genome sequencing. Targeted RNA analysis revealed aberrant splicing resulting in the formation of a pseudo-exon with a premature stop codon. The variant segregated with the phenotype in several family members allowing its evaluation as likely pathogenic. This study supports the utility of long-read DNA sequencing and complementary RNA approaches to tackle unsolved cases of hereditary disease.
Collapse
Affiliation(s)
- Alexandra A Baumann
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Lisanne I Knol
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- Department of Translational Medical Oncology, NCT Dresden and DKFZ, Dresden, Germany
- Translational Medical Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Marie Arlt
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Tim Hutschenreiter
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Anja Richter
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Thomas J Widmann
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research (GENYO), PTS Granada, managed by Fundación Pública Andaluza Progreso y Salud (FPS), Granada, Spain
| | - Marcus Franke
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Karl Hackmann
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Sylke Winkler
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Daniela Richter
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- Department of Translational Medical Oncology, NCT Dresden and DKFZ, Dresden, Germany
- Translational Medical Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- German Cancer Consortium (DKTK), Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Isabel Spier
- Institute of Human Genetics, Medical Faculty, University of Bonn, Bonn, Germany
- National Center for Hereditary Tumor Syndromes, University Hospital Bonn, Bonn, Germany
| | - Stefan Aretz
- Institute of Human Genetics, Medical Faculty, University of Bonn, Bonn, Germany
- National Center for Hereditary Tumor Syndromes, University Hospital Bonn, Bonn, Germany
| | - Daniela Aust
- Institute of Pathology, University Hospital Carl Gustav Carus at TUD Dresden University, Dresden, Germany
- Tumor- and Normal Tissue Bank of the University Cancer Center (UCC), University Hospital Carl Gustav Carus, Medical Faculty, TUD Dresden University of Technology, Dresden, Germany
| | - Joseph Porrmann
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Doreen William
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
| | - Evelin Schröck
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- German Cancer Consortium (DKTK), Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Hanno Glimm
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
- Department of Translational Medical Oncology, NCT Dresden and DKFZ, Dresden, Germany
- Translational Medical Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
- German Cancer Consortium (DKTK), Dresden, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Center for Personalized Oncology, NCT Dresden and University Hospital Carl Gustav Carus, Faculty of Medicine and TUD Dresden University of Technology, Dresden, Germany
- Translational Functional Cancer Genomics, NCT Heidelberg and DKFZ, Heidelberg, Germany
| | - Arne Jahn
- Institute for Clinical Genetics, University Hospital Carl Gustav Carus at TUD Dresden University of Technology and Faculty of Medicine of TUD Dresden University of Technology, Dresden, Germany.
- National Center for Tumor Diseases (NCT), NCT/UCC Dresden,, a partnership between DKFZ, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, and Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany.
- ERN GENTURIS, Hereditary Cancer Syndrome Center Dresden, Dresden, Germany.
- German Cancer Consortium (DKTK), Dresden, Germany.
- German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
7
|
Fu Y, Timp W, Sedlazeck FJ. Computational analysis of DNA methylation from long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00822-5. [PMID: 40155770 DOI: 10.1038/s41576-025-00822-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2025] [Indexed: 04/01/2025]
Abstract
DNA methylation is a critical epigenetic mechanism in numerous biological processes, including gene regulation, development, ageing and the onset of various diseases such as cancer. Studies of methylation are increasingly using single-molecule long-read sequencing technologies to simultaneously measure epigenetic states such as DNA methylation with genomic variation. These long-read data sets have spurred the continuous development of advanced computational methods to gain insights into the roles of methylation in regulating chromatin structure and gene regulation. In this Review, we discuss the computational methods for calling methylation signals, contrasting methylation between samples, analysing cell-type diversity and gaining additional genomic insights, and then further discuss the challenges and future perspectives of tool development for DNA methylation research.
Collapse
Affiliation(s)
- Yilei Fu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
8
|
Berghöfer J, Khaveh N, Mundlos S, Metzger J. Multi-tool copy number detection highlights common body size-associated variants in miniature pig breeds from different geographical regions. BMC Genomics 2025; 26:285. [PMID: 40121435 PMCID: PMC11929999 DOI: 10.1186/s12864-025-11446-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 03/05/2025] [Indexed: 03/25/2025] Open
Abstract
BACKGROUND Copy number variations (CNVs) represent a common and highly specific type of variation in the genome, potentially influencing genetic diversity and mammalian phenotypic development. Structural variants, such as deletions, duplications, and insertions, have frequently been highlighted as key factors influencing traits in high-production pigs. However, comprehensive CNV analyses in miniature pig breeds are limited despite their value in biomedical research. RESULTS This study performed whole-genome sequencing in 36 miniature pigs from nine breeds from America, Asia and Oceania, and Europe. By employing a multi-tool approach (CNVpytor, Delly, GATK gCNV, Smoove), the accuracy of CNV identification was improved. In total, 34 homozygous CNVs overlapped with exonic regions in all samples, suggesting a role in expressing specific phenotypes such as uniform growth patterns, fertility, or metabolic function. In addition, 386 copy number variation regions (CNVRs) shared by all breeds were detected, covering 33.6 Mb (1.48% of the autosomal genome). Further, 132 exclusive CNVRs were identified for American breeds, 47 for Asian and Oceanian breeds, and 114 for European breeds. Functional enrichment analysis revealed genes within the common CNVRs involved in body height determination and other growth-related parameters. Exclusive CNVRs were located in the region of genes enriched for lipid metabolism in American minipigs, reproductive traits in Asian and Oceanian breeds, and cardiovascular features and body height in European breeds. In the selected groups, quantitative trait loci associated with body size, meat quality, reproduction, and disease susceptibility were highlighted. CONCLUSION This investigation of the CNV landscape of minipigs underlines the impact of selective breeding on structural variants and its role in the development of specific breed phenotypes across geographical areas. The multi-tool approach provides a valuable resource for future studies on the effects of artificial selection on livestock genomes.
Collapse
Affiliation(s)
- Jan Berghöfer
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Chemistry and Biochemistry, Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Berlin, Germany
- Institute of Animal Genomics, University of Veterinary Medicine Hanover, Hanover, Germany
| | - Nadia Khaveh
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Animal Genomics, University of Veterinary Medicine Hanover, Hanover, Germany
| | - Stefan Mundlos
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Medical and Human Genetics, Charité Universitätsmedizin Berlin, Berlin, Germany
- Charité - Universitätsmedizin Berlin, BCRT - Berlin Institute of Health Centre for Regenerative Therapies, Berlin, Germany
| | - Julia Metzger
- Max Planck Institute for Molecular Genetics, Berlin, Germany.
- Institute of Animal Genomics, University of Veterinary Medicine Hanover, Hanover, Germany.
| |
Collapse
|
9
|
Ashley EA. Ambient storage of genomic time capsules. Trends Genet 2025; 41:181-182. [PMID: 39736477 DOI: 10.1016/j.tig.2024.11.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Accepted: 11/21/2024] [Indexed: 01/01/2025]
Abstract
While the cost of genome sequencing has decreased, -80°C DNA preservation and raw sequence data archiving remain expensive. Transitioning to room-temperature DNA preservation could reduce costs, lessen researchers' reliance on the electrical grid, and encourage a future proofing strategy of periodical updating with higher quality sequencing instead of long-term storage of raw signal data. A new technology recently described by Prince et al. that could help realize these goals is Thermoset-REinforced Xeropreservation (T-REX).
Collapse
Affiliation(s)
- Euan A Ashley
- Department of Medicine (Cardiovascular Medicine), Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
10
|
Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y, Cao R, Tan J, Wang Q, Ma Y, Li J, Luo J, Wang S, Wang J, Zhang G, Xu S, Qian F, Zhou F, Tang H, Li D, Sedlazeck FJ, Jin L, Guan Y, Fan S. Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility. Nat Commun 2025; 16:1494. [PMID: 39929826 PMCID: PMC11811171 DOI: 10.1038/s41467-025-56661-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 01/22/2025] [Indexed: 02/13/2025] Open
Abstract
Genomic structural variants (SVs) are a major source of genetic diversity in humans. Here, through long-read sequencing of 945 Han Chinese genomes, we identify 111,288 SVs, including 24.56% unreported variants, many with predicted functional importance. By integrating human population-level phenotypic and multi-omics data as well as two humanized mouse models, we demonstrate the causal roles of two SVs: one SV that emerges at the common ancestor of modern humans, Neanderthals, and Denisovans in GSDMD for bone mineral density and one modern-human-specific SV in WWP2 impacting height, weight, fat, craniofacial phenotypes and immunity. Our results suggest that the GSDMD SV could serve as a rapid and cost-effective biomarker for assessing the risk of cisplatin-induced acute kidney injury. The functional conservation from human to mouse and widespread signals of positive natural selection suggest that both SVs likely influence local adaptation, phenotypic diversity, and disease susceptibility across diverse human populations.
Collapse
Affiliation(s)
- Jiao Gong
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Huiru Sun
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Kaiyuan Wang
- Shanghai Frontiers Science Center of Genome Editing and Cell Therapy, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Yanhui Zhao
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Yechao Huang
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Qinsheng Chen
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Hui Qiao
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Yang Gao
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Jialin Zhao
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Yunchao Ling
- Bio-Med Big Data Center, Chinese Academy of Sciences Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of the Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ruifang Cao
- Bio-Med Big Data Center, Chinese Academy of Sciences Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of the Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jingze Tan
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Qi Wang
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Yanyun Ma
- Department of Anthropology and Human Genetics, Institute for Six-sector Economy, and MOE Key Laboratory of Contemporary Anthropology, Fudan University, Shanghai, China
| | - Jing Li
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Jingchun Luo
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Sijia Wang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Jiucun Wang
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
- Research Unit of dissecting the population genetics and developing new technologies for treatment and prevention of skin phenotypes and dermatological diseases (2019RU058), Chinese Academy of Medical Sciences, Shanghai, China
| | - Guoqing Zhang
- Bio-Med Big Data Center, Chinese Academy of Sciences Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of the Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Shuhua Xu
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Feng Qian
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Fang Zhou
- School of Data Science and Engineering, East China Normal University, Shanghai, China
| | - Huiru Tang
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China
| | - Dali Li
- Shanghai Frontiers Science Center of Genome Editing and Cell Therapy, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China.
- Research Unit of dissecting the population genetics and developing new technologies for treatment and prevention of skin phenotypes and dermatological diseases (2019RU058), Chinese Academy of Medical Sciences, Shanghai, China.
| | - Yuting Guan
- Shanghai Frontiers Science Center of Genome Editing and Cell Therapy, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Lab for Evolutionary Synthesis, School of Life Sciences, Human Phenome Institute, Fudan University, Shanghai, China.
| |
Collapse
|
11
|
He Q, Lai Z, Zhai Z, Zou B, Shi Y, Feng C. Advances of research in diabetic cardiomyopathy: diagnosis and the emerging application of sequencing. Front Cardiovasc Med 2025; 11:1501735. [PMID: 39872882 PMCID: PMC11769946 DOI: 10.3389/fcvm.2024.1501735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 12/27/2024] [Indexed: 01/30/2025] Open
Abstract
Diabetic cardiomyopathy (DCM) is one of the most prevalent and severe complications associated with diabetes mellitus (DM). The onset of DCM is insidious, with the symptoms being obvious only in the late stage. Consequently, the early diagnosis of DCM is a formidable challenge which significantly influences the treatment and prognosis of DCM. Thus, it becomes imperative to uncover innovative approaches to facilitate the prompt identification and diagnosis of DCM. On the traditional clinical side, we tend to use serum biomarkers as well as imaging as the most common means of diagnosing diseases because of their convenience as well as affordability. As we delve deeper into the mechanisms of DCM, a wide variety of biomarkers are becoming competitive diagnostic indicators. Meanwhile, the application of multiple imaging techniques has also made efforts to promote the diagnosis of DCM. Besides, the spurt in sequencing technology has made it possible to give hints about disease diagnosis from the genome as well as the transcriptome, making diagnosis less difficult, more sensitive, and more predictive. Overall, sequencing technology is expected to be the superior choice of plasma biomarkers for detecting lesions at an earlier stage than imaging, and its judicious utilization combined with imaging technologies will lead to a more sensitive diagnosis of DCM in the future. Therefore, this review meticulously consolidates the progress and utilization of various biomarkers, imaging methods, and sequencing technologies in the realm of DCM diagnosis, with the aim of furnishing novel theoretical foundation and guide future research endeavors towards enhancing the diagnostic and therapeutic landscape of DCM.
Collapse
Affiliation(s)
- Qianqian He
- Department of Cardiology, The Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China
| | - Ze Lai
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
| | - Zhengyao Zhai
- Department of Gynecological Oncology, Fudan University Shanghai Cancer Center, Shanghai, China
| | - Beibei Zou
- Department of Cardiology, The Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China
| | - Yangkai Shi
- Department of Cardiology, The Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China
| | - Chao Feng
- Department of Cardiology, The Fourth Affiliated Hospital of School of Medicine, and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, China
| |
Collapse
|
12
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
13
|
Orozco-Arias S, Sierra P, Durbin R, González J. MCHelper automatically curates transposable element libraries across eukaryotic species. Genome Res 2024; 34:2256-2268. [PMID: 39653419 PMCID: PMC11694758 DOI: 10.1101/gr.278821.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 09/18/2024] [Indexed: 12/25/2024]
Abstract
The number of species with high-quality genome sequences continues to increase, in part due to the scaling up of multiple large-scale biodiversity sequencing projects. While the need to annotate genic sequences in these genomes is widely acknowledged, the parallel need to annotate transposable element (TE) sequences that have been shown to alter genome architecture, rewire gene regulatory networks, and contribute to the evolution of host traits is becoming ever more evident. However, accurate genome-wide annotation of TE sequences is still technically challenging. Several de novo TE identification tools are now available, but manual curation of the libraries produced by these tools is needed to generate high-quality genome annotations. Manual curation is time-consuming, and thus impractical for large-scale genomic studies, and lacks reproducibility. In this work, we present the Manual Curator Helper tool MCHelper, which automates the TE library curation process. By leveraging MCHelper's fully automated mode with the outputs from three de novo TE identification tools, RepeatModeler2, EDTA, and REPET, in the fruit fly, rice, hooded crow, zebrafish, maize, and human, we show a substantial improvement in the quality of the TE libraries and genome annotations. MCHelper libraries are less redundant, with up to 65% reduction in the number of consensus sequences, have up to 11.4% fewer false positive sequences, and up to ∼48% fewer "unclassified/unknown" TE consensus sequences. Genome-wide TE annotations are also improved, including larger unfragmented insertions. Moreover, MCHelper is an easy-to-install and easy-to-use tool.
Collapse
Affiliation(s)
| | - Pío Sierra
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Josefa González
- Institute of Evolutionary Biology, CSIC, UPF, 08003 Barcelona, Spain;
- Institut Botànic de Barcelona (IBB), CSIC-CMCNB, 08038 Barcelona, Spain
| |
Collapse
|
14
|
Zhu XT, Sanz-Jimenez P, Ning XT, Tahir Ul Qamar M, Chen LL. Direct RNA sequencing in plants: Practical applications and future perspectives. PLANT COMMUNICATIONS 2024; 5:101064. [PMID: 39155503 PMCID: PMC11589328 DOI: 10.1016/j.xplc.2024.101064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/17/2024] [Accepted: 08/14/2024] [Indexed: 08/20/2024]
Abstract
The transcriptome serves as a bridge that links genomic variation to phenotypic diversity. A vast number of studies using next-generation RNA sequencing (RNA-seq) over the last 2 decades have emphasized the essential roles of the plant transcriptome in response to developmental and environmental conditions, providing numerous insights into the dynamic changes, evolutionary traces, and elaborate regulation of the plant transcriptome. With substantial improvement in accuracy and throughput, direct RNA sequencing (DRS) has emerged as a new and powerful sequencing platform for precise detection of native and full-length transcripts, overcoming many limitations such as read length and PCR bias that are inherent to short-read RNA-seq. Here, we review recent advances in dissecting the complexity and diversity of plant transcriptomes using DRS as the main technological approach, covering many aspects of RNA metabolism, including novel isoforms, poly(A) tails, and RNA modification, and we propose a comprehensive workflow for processing of plant DRS data. Many challenges to the application of DRS in plants, such as the need for machine learning tools tailored to plant transcriptomes, remain to be overcome, and together we outline future biological questions that can be addressed by DRS, such as allele-specific RNA modification. This technology provides convenient support on which the connection of distinct RNA features is tightly built, sustainably refining our understanding of the biological functions of the plant transcriptome.
Collapse
Affiliation(s)
- Xi-Tong Zhu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| | - Pablo Sanz-Jimenez
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiao-Tong Ning
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Muhammad Tahir Ul Qamar
- Integrative Omics and Molecular Modeling Laboratory, Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad 38000, Pakistan
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| |
Collapse
|
15
|
Hu X, Liu J, Xu T, Qin K, Feng Y, Jia Z, Zhao X. Research progress and application of the third-generation sequencing technologies in forensic medicine. Leg Med (Tokyo) 2024; 71:102532. [PMID: 39504855 DOI: 10.1016/j.legalmed.2024.102532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 09/18/2024] [Accepted: 09/22/2024] [Indexed: 11/08/2024]
Abstract
Third-generation sequencing technologies, exemplified by single-molecule real-time sequencing and nanopore sequencing, provide a constellation of advantages, including long read lengths, high throughput, real-time sequencing capabilities, and remarkable portability. These cutting-edge methodologies have provided new tools for genomic analysis in forensic medicine. To gain a comprehensive understanding of the current applications and cutting-edge trends of third-generation sequencing technologies in forensic medicine, this study retrieved relevant literature from the China National Knowledge Infrastructure (CNKI) database and the Web of Science (WOS) database. Using bibliometric software CiteSpace 6.1.R6, the study visualized publication volume, countries, and keywords related to the application of third-generation sequencing technologies in forensic medicine from 2014 to 2023. The review then summarized the foundational principles, characteristics, and promising prospects of third-generation sequencing technologies in forensic medicine. Notably, it highlights their remarkable contributions in forensic individual identification, body fluid identification, forensic epigenetic analysis, microbial analysis and forensic species identification.
Collapse
Affiliation(s)
- Xiaoxin Hu
- School of Investigation, People's Public Security University of China, Beijing 100038, China.
| | - Jinjie Liu
- Criminal Investigation Corps of Beijing Public Security Bureau, Beijing 100054, China
| | - Tingyu Xu
- School of Investigation, People's Public Security University of China, Beijing 100038, China
| | - Kaiyue Qin
- School of Investigation, People's Public Security University of China, Beijing 100038, China
| | - Yunpeng Feng
- School of Investigation, People's Public Security University of China, Beijing 100038, China
| | - Zhenjun Jia
- School of Investigation, People's Public Security University of China, Beijing 100038, China.
| | - Xingchun Zhao
- Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China.
| |
Collapse
|
16
|
Jiang Z, Peng Z, Wei Z, Sun J, Luo Y, Bie L, Zhang G, Wang Y. A deep learning-based method enables the automatic and accurate assembly of chromosome-level genomes. Nucleic Acids Res 2024; 52:e92. [PMID: 39287126 PMCID: PMC11514472 DOI: 10.1093/nar/gkae789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 08/25/2024] [Accepted: 08/30/2024] [Indexed: 09/19/2024] Open
Abstract
The application of high-throughput chromosome conformation capture (Hi-C) technology enables the construction of chromosome-level assemblies. However, the correction of errors and the anchoring of sequences to chromosomes in the assembly remain significant challenges. In this study, we developed a deep learning-based method, AutoHiC, to address the challenges in chromosome-level genome assembly by enhancing contiguity and accuracy. Conventional Hi-C-aided scaffolding often requires manual refinement, but AutoHiC instead utilizes Hi-C data for automated workflows and iterative error correction. When trained on data from 300+ species, AutoHiC demonstrated a robust average error detection accuracy exceeding 90%. The benchmarking results confirmed its significant impact on genome contiguity and error correction. The innovative approach and comprehensive results of AutoHiC constitute a breakthrough in automated error detection, promising more accurate genome assemblies for advancing genomics research.
Collapse
Affiliation(s)
- Zijie Jiang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Zhixiang Peng
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Zhaoyuan Wei
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Jiahe Sun
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Yongjiang Luo
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Lingzi Bie
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Guoqing Zhang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| | - Yi Wang
- Integrative Science Center of Germplasm Creation in Western China (CHONGQING) Science City, Biological Science Research Center, Southwest University, Chongqing, China
| |
Collapse
|
17
|
Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Finocchio A, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive genome analysis and variant detection at scale using DRAGEN. Nat Biotechnol 2024:10.1038/s41587-024-02382-1. [PMID: 39455800 PMCID: PMC12022141 DOI: 10.1038/s41587-024-02382-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Accepted: 08/08/2024] [Indexed: 10/28/2024]
Abstract
Research and medical genomics require comprehensive, scalable methods for the discovery of novel disease targets, evolutionary drivers and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size or location. Here we present DRAGEN, which uses multigenome mapping with pangenome references, hardware acceleration and machine learning-based variant detection to provide insights into individual genomes, with ~30 min of computation time from raw reads to variant detection. DRAGEN outperforms current state-of-the-art methods in speed and accuracy across all variant types (single-nucleotide variations, insertions or deletions, short tandem repeats, structural variations and copy number variations) and incorporates specialized methods for analysis of medically relevant genes. We demonstrate the performance of DRAGEN across 3,202 whole-genome sequencing datasets by generating fully genotyped multisample variant call format files and demonstrate its scalability, accuracy and innovation to further advance the integration of comprehensive genomics. Overall, DRAGEN marks a major milestone in sequencing data analysis and will provide insights across various diseases, including Mendelian and rare diseases, with a highly comprehensive and scalable platform.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | | | | | | | | | | | | | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
18
|
Dwarshuis N, Kalra D, McDaniel J, Sanio P, Alvarez Jerez P, Jadhav B, Huang WE, Mondal R, Busby B, Olson ND, Sedlazeck FJ, Wagner J, Majidian S, Zook JM. The GIAB genomic stratifications resource for human reference genomes. Nat Commun 2024; 15:9029. [PMID: 39424793 PMCID: PMC11489684 DOI: 10.1038/s41467-024-53260-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 10/07/2024] [Indexed: 10/21/2024] Open
Abstract
Despite the growing variety of sequencing and variant-calling tools, no workflow performs equally well across the entire human genome. Understanding context-dependent performance is critical for enabling researchers, clinicians, and developers to make informed tradeoffs when selecting sequencing hardware and software. Here we describe a set of "stratifications," which are BED files that define distinct contexts throughout the genome. We define these for GRCh37/38 as well as the new T2T-CHM13 reference, adding many new hard-to-sequence regions which are critical for understanding performance as the field progresses. Specifically, we highlight the increase in hard-to-map and GC-rich stratifications in CHM13 relative to the previous references. We then compare the benchmarking performance with each reference and show the performance penalty brought about by these additional difficult regions in CHM13. Additionally, we demonstrate how the stratifications can track context-specific improvements over different platform iterations, using Oxford Nanopore Technologies as an example. The means to generate these stratifications are available as a snakemake pipeline at https://github.com/usnistgov/giab-stratifications . We anticipate this being useful in enabling precise risk-reward calculations when building sequencing pipelines for any of the commonly-used reference genomes.
Collapse
Affiliation(s)
- Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Bharati Jadhav
- Department of Genetics and Genomic Sciences and Mindich Child Health and Development Institute, Icahn School of Medicine at Mount, Hess Center for Science and Medicine, New York, NY, USA
| | - Wenyu Eddy Huang
- Department of Computer Science, College of Engineering, Rice University, Houston, TX, USA
| | - Rajarshi Mondal
- Department of Bioinformatics, Pondicherry University, Pondicherry, India
| | | | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, College of Engineering, Rice University, Houston, TX, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD., USA.
| |
Collapse
|
19
|
Jiang L, Quail MA, Fraser-Govil J, Wang H, Shi X, Oliver K, Mellado Gomez E, Yang F, Ning Z. The Bioinformatic Applications of Hi-C and Linked Reads. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae048. [PMID: 38905513 PMCID: PMC11580686 DOI: 10.1093/gpbjnl/qzae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/07/2024] [Accepted: 06/19/2024] [Indexed: 06/23/2024]
Abstract
Long-range sequencing grants insight into additional genetic information beyond what can be accessed by both short reads and modern long-read technology. Several new sequencing technologies, such as "Hi-C" and "Linked Reads", produce long-range datasets for high-throughput and high-resolution genome analyses, which are rapidly advancing the field of genome assembly, genome scaffolding, and more comprehensive variant identification. In this review, we focused on five major long-range sequencing technologies: high-throughput chromosome conformation capture (Hi-C), 10X Genomics Linked Reads, haplotagging, transposase enzyme linked long-read sequencing (TELL-seq), and single- tube long fragment read (stLFR). We detailed the mechanisms and data products of the five platforms and their important applications, evaluated the quality of sequencing data from different platforms, and discussed the currently available bioinformatics tools. This work will benefit the selection of appropriate long-range technology for specific biological studies.
Collapse
Affiliation(s)
- Libo Jiang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Michael A Quail
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Jack Fraser-Govil
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Haipeng Wang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Xuequn Shi
- College of Food Science and Technology, Hainan University, Haikou 570228, China
| | - Karen Oliver
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Esther Mellado Gomez
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Fengtang Yang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Zemin Ning
- The Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
20
|
Henglin M, Ghareghani M, Harvey WT, Porubsky D, Koren S, Eichler EE, Ebert P, Marschall T. Graphasing: phasing diploid genome assembly graphs with single-cell strand sequencing. Genome Biol 2024; 25:265. [PMID: 39390579 PMCID: PMC11466045 DOI: 10.1186/s13059-024-03409-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 09/30/2024] [Indexed: 10/12/2024] Open
Abstract
Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.
Collapse
Affiliation(s)
- Mir Henglin
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Maryam Ghareghani
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
21
|
Amstler S, Streiter G, Pfurtscheller C, Forer L, Di Maio S, Weissensteiner H, Paulweber B, Schönherr S, Kronenberg F, Coassin S. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR. Genome Med 2024; 16:117. [PMID: 39380090 PMCID: PMC11462820 DOI: 10.1186/s13073-024-01391-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 10/01/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Repetitive genome regions, such as variable number of tandem repeats (VNTR) or short tandem repeats (STR), are major constituents of the uncharted dark genome and evade conventional sequencing approaches. The protein-coding LPA kringle IV type-2 (KIV-2) VNTR (5.6 kb per unit, 1-40 units per allele) is a medically highly relevant example with a particularly intricate structure, multiple haplotypes, intragenic homologies, and an intra-VNTR STR. It is the primary regulator of plasma lipoprotein(a) [Lp(a)] concentrations, an important cardiovascular risk factor. Lp(a) concentrations vary widely between individuals and ancestries. Multiple variants and functional haplotypes in the LPA gene and especially in the KIV-2 VNTR strongly contribute to this variance. METHODS We evaluated the performance of amplicon-based nanopore sequencing with unique molecular identifiers (UMI-ONT-Seq) for SNP detection, haplotype mapping, VNTR unit consensus sequence generation, and copy number estimation via coverage-corrected haplotypes quantification in the KIV-2 VNTR. We used 15 human samples and low-level mixtures (0.5 to 5%) of KIV-2 plasmids as a validation set. We then applied UMI-ONT-Seq to extract KIV-2 VNTR haplotypes in 48 multi-ancestry 1000 Genome samples and analyzed at scale a poorly characterized STR within the KIV-2 VNTR. RESULTS UMI-ONT-Seq detected KIV-2 SNPs down to 1% variant level with high sensitivity, specificity, and precision (0.977 ± 0.018; 1.000 ± 0.0005; 0.993 ± 0.02) and accurately retrieved the full-length haplotype of each VNTR unit. Human variant levels were highly correlated with next-generation sequencing (R2 = 0.983) without bias across the whole variant level range. Six reads per UMI produced sequences of each KIV-2 unit with Q40 quality. The KIV-2 repeat number determined by coverage-corrected unique haplotype counting was in close agreement with droplet digital PCR (ddPCR), with 70% of the samples falling even within the narrow confidence interval of ddPCR. We then analyzed 62,679 intra-KIV-2 STR sequences and explored KIV-2 SNP haplotype patterns across five ancestries. CONCLUSIONS UMI-ONT-Seq accurately retrieves the SNP haplotype and precisely quantifies the VNTR copy number of each repeat unit of the complex KIV-2 VNTR region across multiple ancestries. This study utilizes the KIV-2 VNTR, presenting a novel and potent tool for comprehensive characterization of medically relevant complex genome regions at scale.
Collapse
Affiliation(s)
- Stephan Amstler
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Gertraud Streiter
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Cathrin Pfurtscheller
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Lukas Forer
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Silvia Di Maio
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Hansi Weissensteiner
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Bernhard Paulweber
- Department of Internal Medicine I, Paracelsus Medical University, Salzburg, Austria
| | - Sebastian Schönherr
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Florian Kronenberg
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Stefan Coassin
- Institute of Genetic Epidemiology, Medical University of Innsbruck, Innsbruck, Austria.
| |
Collapse
|
22
|
Koch E, Pardiñas AF, O'Connell KS, Selvaggi P, Camacho Collados J, Babic A, Marshall SE, Van der Eycken E, Angulo C, Lu Y, Sullivan PF, Dale AM, Molden E, Posthuma D, White N, Schubert A, Djurovic S, Heimer H, Stefánsson H, Stefánsson K, Werge T, Sønderby I, O'Donovan MC, Walters JTR, Milani L, Andreassen OA. How Real-World Data Can Facilitate the Development of Precision Medicine Treatment in Psychiatry. Biol Psychiatry 2024; 96:543-551. [PMID: 38185234 PMCID: PMC11758919 DOI: 10.1016/j.biopsych.2024.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/20/2023] [Accepted: 01/02/2024] [Indexed: 01/09/2024]
Abstract
Precision medicine has the ambition to improve treatment response and clinical outcomes through patient stratification and holds great potential for the treatment of mental disorders. However, several important factors are needed to transform current practice into a precision psychiatry framework. Most important are 1) the generation of accessible large real-world training and test data including genomic data integrated from multiple sources, 2) the development and validation of advanced analytical tools for stratification and prediction, and 3) the development of clinically useful management platforms for patient monitoring that can be integrated into health care systems in real-life settings. This narrative review summarizes strategies for obtaining the key elements-well-powered samples from large biobanks integrated with electronic health records and health registry data using novel artificial intelligence algorithms-to predict outcomes in severe mental disorders and translate these models into clinical management and treatment approaches. Key elements are massive mental health data and novel artificial intelligence algorithms. For the clinical translation of these strategies, we discuss a precision medicine platform for improved management of mental disorders. We use cases to illustrate how precision medicine interventions could be brought into psychiatry to improve the clinical outcomes of mental disorders.
Collapse
Affiliation(s)
- Elise Koch
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Antonio F Pardiñas
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Kevin S O'Connell
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Pierluigi Selvaggi
- Department of Translational Biomedicine and Neuroscience, University of Bari Aldo Moro, Bari, Italy
| | - José Camacho Collados
- CardiffNLP, School of Computer Science and Informatics, Cardiff University, Cardiff, United Kingdom
| | | | | | - Erik Van der Eycken
- Global Alliance of Mental Illness Advocacy Networks-Europe, Brussels, Belgium
| | - Cecilia Angulo
- Global Alliance of Mental Illness Advocacy Networks-Europe, Brussels, Belgium
| | - Yi Lu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden
| | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden; Departments of Genetics and Psychiatry, University of North Carolina, Chapel Hill, North Carolina
| | - Anders M Dale
- Multimodal Imaging Laboratory, University of California San Diego, La Jolla, California; Departments of Radiology, Psychiatry, and Neurosciences, University of California, San Diego, La Jolla, California
| | - Espen Molden
- Center for Psychopharmacology, Diakonhjemmet Hospital, Oslo, Norway
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Nathan White
- CorTechs Laboratories, Inc., San Diego, California
| | | | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway; The Norwegian Centre for Mental Disorders Research Centre, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Hakon Heimer
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Nordic Society of Human Genetics and Precision Medicine, Copenhagen, Denmark
| | | | | | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Mental Health Services Copenhagen, Roskilde, Denmark; Lundbeck Foundation Initiative for Integrative Psychiatric Research, Copenhagen, Denmark; Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Ida Sønderby
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Department of Medical Genetics, Oslo University Hospital, Oslo, Norway; KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Michael C O'Donovan
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - James T R Walters
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Lili Milani
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia; Genetics and Personalized Medicine Clinic, Tartu University Hospital, Tartu, Estonia
| | - Ole A Andreassen
- Norwegian Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway; KG Jebsen Centre for Neurodevelopmental Disorders, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
23
|
Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M, Behera S, Kalef-Ezra E, Gandhi M, Hong K, Pehlivan D, Scholz SW, Carvalho CMB, Proukakis C, Sedlazeck FJ. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571-1580. [PMID: 38168980 PMCID: PMC11217151 DOI: 10.1038/s41587-023-02024-y] [Citation(s) in RCA: 92] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 10/11/2023] [Indexed: 01/05/2024]
Abstract
Calling structural variations (SVs) is technically challenging, but using long reads remains the most accurate way to identify complex genomic alterations. Here we present Sniffles2, which improves over current methods by implementing a repeat aware clustering coupled with a fast consensus sequence and coverage-adaptive filtering. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across different coverages (5-50×), sequencing technologies (ONT and HiFi) and SV types. Furthermore, Sniffles2 solves the problem of family-level to population-level SV calling to produce fully genotyped VCF files. Across 11 probands, we accurately identified causative SVs around MECP2, including highly complex alleles with three overlapping SVs. Sniffles2 also enables the detection of mosaic SVs in bulk long-read data. As a result, we identified multiple mosaic SVs in brain tissue from a patient with multiple system atrophy. The identified SV showed a remarkable diversity within the cingulate cortex, impacting both genes involved in neuron function and repetitive elements.
Collapse
Affiliation(s)
- Moritz Smolka
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Luis F Paulin
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | | | - Dominic W Horner
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sairam Behera
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
| | - Ester Kalef-Ezra
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Mira Gandhi
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Karl Hong
- Bionano Genomics, San Diego, CA, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Division of Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Pacific Northwest Research Institute (PNRI), Seattle, WA, USA
| | - Christos Proukakis
- Department of Clinical and Movement Neurosciences, Royal Free Campus, Queen Square Institute of Neurology, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
24
|
Garcia TX, Matzuk MM. Novel Genes of the Male Reproductive System: Potential Roles in Male Reproduction and as Non-hormonal Male Contraceptive Targets. Mol Reprod Dev 2024; 91:e70000. [PMID: 39422082 DOI: 10.1002/mrd.70000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 09/29/2024] [Accepted: 10/01/2024] [Indexed: 10/19/2024]
Abstract
The development of novel non-hormonal male contraceptives represents a pivotal frontier in reproductive health, driven by the need for safe, effective, and reversible contraceptive methods. This comprehensive review explores the genetic underpinnings of male fertility, emphasizing the crucial roles of specific genes and structural variants (SVs) identified through advanced sequencing technologies such as long-read sequencing (LRS). LRS has revolutionized the detection of structural variants and complex genomic regions, offering unprecedented precision and resolution over traditional next-generation sequencing (NGS). Key genetic targets, including those implicated in spermatogenesis and sperm motility, are highlighted, showcasing their potential as non-hormonal contraceptive targets. The review delves into the systematic identification and validation of male reproductive tract-specific genes, utilizing advanced transcriptomics and genomics studies with validation using novel knockout mouse models. We discuss the innovative application of small molecule inhibitors, developed through platforms like DNA-encoded chemistry technology (DEC-Tec), which have shown significant promise in preclinical models. Notable examples include inhibitors targeting serine/threonine kinase 33 (STK33), soluble adenylyl cyclase (sAC), cyclin-dependent kinase 2 (CDK2), and bromodomain testis associated (BRDT), each demonstrating nanomolar affinity and potential for reversible and specific inhibition of male fertility. This review also honors the contributions of Dr. David L. Garbers whose foundational work has paved the way for these advancements. The integration of genomic, proteomic, and chemical biology approaches, supported by interdisciplinary collaboration, is poised to transform male contraceptive development. Future perspectives emphasize the need for continued innovation and rigorous testing to bring these novel contraceptives from the laboratory to clinical application, promising a new era of male reproductive health management.
Collapse
Affiliation(s)
- Thomas X Garcia
- Center for Drug Discovery, Baylor College of Medicine, Houston, Texas, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, Texas, USA
- Scott Department of Urology, Baylor College of Medicine, Houston, Texas, USA
| | - Martin M Matzuk
- Center for Drug Discovery, Baylor College of Medicine, Houston, Texas, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
25
|
Cui M, Liu Y, Yu X, Guo H, Jiang T, Wang Y, Liu B. miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data. Brief Bioinform 2024; 25:bbae473. [PMID: 39331016 PMCID: PMC11428505 DOI: 10.1093/bib/bbae473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 06/18/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open
Abstract
Nanopore sequence technology has demonstrated a longer read length and enabled to potentially address the limitations of short-read sequencing including long-range haplotype phasing and accurate variant calling. However, there is still room for improvement in terms of the performance of single nucleotide variant (SNV) identification and computing resource usage for the state-of-the-art approaches. In this work, we introduce miniSNV, a lightweight SNV calling algorithm that simultaneously achieves high performance and yield. miniSNV utilizes known common variants in populations as variation backgrounds and leverages read pileup, read-based phasing, and consensus generation to identify and genotype SNVs for Oxford Nanopore Technologies (ONT) long reads. Benchmarks on real and simulated ONT data under various error profiles demonstrate that miniSNV has superior sensitivity and comparable accuracy on SNV detection and runs faster with outstanding scalability and lower memory than most state-of-the-art variant callers. miniSNV is available from https://github.com/CuiMiao-HIT/miniSNV.
Collapse
Affiliation(s)
- Miao Cui
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
| | - Yadong Liu
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, 26 Longyuan East 7th Street, Zhengdong New District, Zhengzhou, Henan 450000, China
| | - Xian Yu
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
| | - Hongzhe Guo
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, 26 Longyuan East 7th Street, Zhengdong New District, Zhengzhou, Henan 450000, China
| | - Tao Jiang
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, 26 Longyuan East 7th Street, Zhengdong New District, Zhengzhou, Henan 450000, China
| | - Yadong Wang
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, 26 Longyuan East 7th Street, Zhengdong New District, Zhengzhou, Henan 450000, China
| | - Bo Liu
- Faculty of Computing, Harbin Institute of Technology, 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001, China
- Zhengzhou Research Institute, Harbin Institute of Technology, 26 Longyuan East 7th Street, Zhengdong New District, Zhengzhou, Henan 450000, China
| |
Collapse
|
26
|
Höps W, Rausch T, Jendrusch M, Korbel JO, Sedlazeck FJ. Impact and characterization of serial structural variations across humans and great apes. Nat Commun 2024; 15:8007. [PMID: 39266513 PMCID: PMC11393467 DOI: 10.1038/s41467-024-52027-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 08/23/2024] [Indexed: 09/14/2024] Open
Abstract
Modern sequencing technology enables the systematic detection of complex structural variation (SV) across genomes. However, extensive DNA rearrangements arising through a series of mutations, a phenomenon we refer to as serial SV (sSV), remain underexplored, posing a challenge for SV discovery. Here, we present NAHRwhals ( https://github.com/WHops/NAHRwhals ), a method to infer repeat-mediated series of SVs in long-read genomic assemblies. Applying NAHRwhals to haplotype-resolved human genomes from 28 individuals reveals 37 sSV loci of various length and complexity. These sSVs explain otherwise cryptic variation in medically relevant regions such as the TPSAB1 gene, 8p23.1, 22q11 and Sotos syndrome regions. Comparisons with great ape assemblies indicate that most human sSVs formed recently, after the human-ape split, and involved non-repeat-mediated processes in addition to non-allelic homologous recombination. NAHRwhals reliably discovers and characterizes sSVs at scale and independent of species, uncovering their genomic abundance and suggesting broader implications for disease.
Collapse
Affiliation(s)
- Wolfram Höps
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, University of Heidelberg, Heidelberg, Germany
| | - Michael Jendrusch
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstr. 1, 69117, Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| |
Collapse
|
27
|
Dorji J, Chamberlain AJ, Reich CM, VanderJagt CJ, Nguyen TV, Daetwyler HD, MacLeod IM. Mitochondrial sequence variants: testing imputation accuracy and their association with dairy cattle milk traits. Genet Sel Evol 2024; 56:62. [PMID: 39266998 PMCID: PMC11391750 DOI: 10.1186/s12711-024-00931-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 08/27/2024] [Indexed: 09/14/2024] Open
Abstract
BACKGROUND Mitochondrial genomes differ from the nuclear genome and in humans it is known that mitochondrial variants contribute to genetic disorders. Prior to genomics, some livestock studies assessed the role of the mitochondrial genome but these were limited and inconclusive. Modern genome sequencing provides an opportunity to re-evaluate the potential impact of mitochondrial variation on livestock traits. This study first evaluated the empirical accuracy of mitochondrial sequence imputation and then used real and imputed mitochondrial sequence genotypes to study the role of mitochondrial variants on milk production traits of dairy cattle. RESULTS The empirical accuracy of imputation from Single Nucleotide Polymorphism (SNP) panels to mitochondrial sequence genotypes was assessed in 516 test animals of Holstein, Jersey and Red breeds using Beagle software and a sequence reference of 1883 animals. The overall accuracy estimated as the Pearson's correlation squared (R2) between all imputed and real genotypes across all animals was 0.454. The low accuracy was attributed partly to the majority of variants having low minor allele frequency (MAF < 0.005) but also due to variants in the hypervariable D-loop region showing poor imputation accuracy. Beagle software provides an internal estimate of imputation accuracy (DR2), and 10 percent of the total 1927 imputed positions showed DR2 greater than 0.9 (N = 201). There were 151 sites with empirical R2 > 0.9 (of 954 variants segregating in the test animals) and 138 of these overlapped the sites with DR2 > 0.9. This suggests that the DR2 statistic is a reasonable proxy to select sites that are imputed with higher accuracy for downstream analyses. Accordingly, in the second part of the study mitochondrial sequence variants were imputed from real mitochondrial SNP panel genotypes of 9515 Australian Holstein, Jersey and Red dairy cattle. Then, using only sites with DR2 > 0.900 and real genotypes, we undertook a genome-wide association study (GWAS) for milk, fat and protein yields. The GWAS mitochondrial SNP effects were not significant. CONCLUSION The accuracy of imputation of mitochondrial genotypes from the SNP panel to sequence was generally low. The Beagle DR2 statistic enabled selection of sites imputed with higher empirical accuracy. We recommend building larger reference populations with mitochondrial sequence to improve the accuracy of imputing less common variants and ensuring that SNP panels include common variants in the D-loop region.
Collapse
Affiliation(s)
- Jigme Dorji
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.
- Agriculture and Food, CSIRO, St Lucia, QLD, 4067, Australia.
| | - Amanda J Chamberlain
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| | - Coralie M Reich
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Christy J VanderJagt
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Tuan V Nguyen
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
| | - Hans D Daetwyler
- Global Genomics and Breeding Design Vegetable R&D, Bayer Crop Science, Bergschenhoek, The Netherlands
| | - Iona M MacLeod
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, 3083, Australia
| |
Collapse
|
28
|
Singar S, Nagpal R, Arjmandi BH, Akhavan NS. Personalized Nutrition: Tailoring Dietary Recommendations through Genetic Insights. Nutrients 2024; 16:2673. [PMID: 39203810 PMCID: PMC11357412 DOI: 10.3390/nu16162673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/03/2024] Open
Abstract
Personalized nutrition (PN) represents a transformative approach in dietary science, where individual genetic profiles guide tailored dietary recommendations, thereby optimizing health outcomes and managing chronic diseases more effectively. This review synthesizes key aspects of PN, emphasizing the genetic basis of dietary responses, contemporary research, and practical applications. We explore how individual genetic differences influence dietary metabolisms, thus underscoring the importance of nutrigenomics in developing personalized dietary guidelines. Current research in PN highlights significant gene-diet interactions that affect various conditions, including obesity and diabetes, suggesting that dietary interventions could be more precise and beneficial if they are customized to genetic profiles. Moreover, we discuss practical implementations of PN, including technological advancements in genetic testing that enable real-time dietary customization. Looking forward, this review identifies the robust integration of bioinformatics and genomics as critical for advancing PN. We advocate for multidisciplinary research to overcome current challenges, such as data privacy and ethical concerns associated with genetic testing. The future of PN lies in broader adoption across health and wellness sectors, promising significant advancements in public health and personalized medicine.
Collapse
Affiliation(s)
- Saiful Singar
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Ravinder Nagpal
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Bahram H. Arjmandi
- Department of Health, Nutrition, and Food Sciences, College of Education, Health, and Human Sciences, Florida State University, Tallahassee, FL 32306, USA; (S.S.); (R.N.); (B.H.A.)
| | - Neda S. Akhavan
- Department of Kinesiology and Nutrition Sciences, School of Integrated Health Sciences, University of Nevada, Las Vegas, NV 89154, USA
| |
Collapse
|
29
|
Foltz SM, Li Y, Yao L, Terekhanova NV, Weerasinghe A, Gao Q, Dong G, Schindler M, Cao S, Sun H, Jayasinghe RG, Fulton RS, Fronick CC, King J, Kohnen DR, Fiala MA, Chen K, DiPersio JF, Vij R, Ding L. Somatic mutation phasing and haplotype extension using linked-reads in multiple myeloma. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.09.607342. [PMID: 39149342 PMCID: PMC11326269 DOI: 10.1101/2024.08.09.607342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Somatic mutation phasing informs our understanding of cancer-related events, like driver mutations. We generated linked-read whole genome sequencing data for 23 samples across disease stages from 14 multiple myeloma (MM) patients and systematically assigned somatic mutations to haplotypes using linked-reads. Here, we report the reconstructed cancer haplotypes and phase blocks from several MM samples and show how phase block length can be extended by integrating samples from the same individual. We also uncover phasing information in genes frequently mutated in MM, including DIS3, HIST1H1E, KRAS, NRAS, and TP53, phasing 79.4% of 20,705 high-confidence somatic mutations. In some cases, this enabled us to interpret clonal evolution models at higher resolution using pairs of phased somatic mutations. For example, our analysis of one patient suggested that two NRAS hotspot mutations occurred on the same haplotype but were independent events in different subclones. Given sufficient tumor purity and data quality, our framework illustrates how haplotype-aware analysis of somatic mutations in cancer can be beneficial for some cancer cases.
Collapse
Affiliation(s)
- Steven M. Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Nadezhda V. Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Amila Weerasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Qingsong Gao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Guanlan Dong
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Moses Schindler
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Hua Sun
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Catrina C. Fronick
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Justin King
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Daniel R. Kohnen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Mark A. Fiala
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - John F. DiPersio
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63110, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63110, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63110, USA
| |
Collapse
|
30
|
Hjelmen CE. Genome size and chromosome number are critical metrics for accurate genome assembly assessment in Eukaryota. Genetics 2024; 227:iyae099. [PMID: 38869251 DOI: 10.1093/genetics/iyae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 04/02/2024] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
The number of genome assemblies has rapidly increased in recent history, with NCBI databases reaching over 41,000 eukaryotic genome assemblies across about 2,300 species. Increases in read length and improvements in assembly algorithms have led to increased contiguity and larger genome assemblies. While this number of assemblies is impressive, only about a third of these assemblies have corresponding genome size estimations for their respective species on publicly available databases. In this paper, genome assemblies are assessed regarding their total size compared to their respective publicly available genome size estimations. These deviations in size are assessed related to genome size, kingdom, sequencing platform, and standard assembly metrics, such as N50 and BUSCO values. A large proportion of assemblies deviate from their estimated genome size by more than 10%, with increasing deviations in size with increased genome size, suggesting nonprotein coding and structural DNA may be to blame. Modest differences in performance of sequencing platforms are noted as well. While standard metrics of genome assessment are more likely to indicate an assembly approaching the estimated genome size, much of the variation in this deviation in size is not explained with these raw metrics. A new, proportional N50 metric is proposed, in which N50 values are made relative to the average chromosome size of each species. This new metric has a stronger relationship with complete genome assemblies and, due to its proportional nature, allows for a more direct comparison across assemblies for genomes with variation in sizes and architectures.
Collapse
Affiliation(s)
- Carl E Hjelmen
- Department of Biology, Utah Valley University, 800 W. University Parkway, Orem, UT 84058, USA
| |
Collapse
|
31
|
Sweeten AP, Schatz MC, Phillippy AM. ModDotPlot-rapid and interactive visualization of tandem repeats. Bioinformatics 2024; 40:btae493. [PMID: 39110522 PMCID: PMC11321072 DOI: 10.1093/bioinformatics/btae493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 07/02/2024] [Accepted: 08/05/2024] [Indexed: 08/15/2024] Open
Abstract
MOTIVATION A common method for analyzing genomic repeats is to produce a sequence similarity matrix visualized via a dot plot. Innovative approaches such as StainedGlass have improved upon this classic visualization by rendering dot plots as a heatmap of sequence identity, enabling researchers to better visualize multi-megabase tandem repeat arrays within centromeres and other heterochromatic regions of the genome. However, computing the similarity estimates for heatmaps requires high computational overhead and can suffer from decreasing accuracy. RESULTS In this work, we introduce ModDotPlot, an interactive and alignment-free dot plot viewer. By approximating average nucleotide identity via a k-mer-based containment index, ModDotPlot produces accurate plots orders of magnitude faster than StainedGlass. We accomplish this through the use of a hierarchical modimizer scheme that can visualize the full 128 Mb genome of Arabidopsis thaliana in under 5 min on a laptop. ModDotPlot is bundled with a graphical user interface supporting real-time interactive navigation of entire chromosomes. AVAILABILITY AND IMPLEMENTATION ModDotPlot is available at https://github.com/marbl/ModDotPlot.
Collapse
Affiliation(s)
- Alexander P Sweeten
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, United States
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, United States
| | - Adam M Phillippy
- Genome Informatics Section, Center for Genomics and Data Science Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, United States
| |
Collapse
|
32
|
Schreiber M, Jayakodi M, Stein N, Mascher M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat Rev Genet 2024; 25:563-577. [PMID: 38378816 PMCID: PMC7616794 DOI: 10.1038/s41576-024-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
Collapse
Affiliation(s)
- Mona Schreiber
- Department of Biology, University of Marburg, Marburg, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
33
|
Taylor DJ, Eizenga JM, Li Q, Das A, Jenike KM, Kenny EE, Miga KH, Monlong J, McCoy RC, Paten B, Schatz MC. Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References. Annu Rev Genomics Hum Genet 2024; 25:77-104. [PMID: 38663087 PMCID: PMC11451085 DOI: 10.1146/annurev-genom-021623-081639] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.
Collapse
Affiliation(s)
- Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Arun Das
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
| | - Katharine M Jenike
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA;
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Jean Monlong
- Institut de Recherche en Santé Digestive, Université de Toulouse, INSERM, INRA, ENVT, UPS, Toulouse, France;
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| | - Benedict Paten
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA; , ,
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA; ,
- Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA; , ,
| |
Collapse
|
34
|
Liu C, Wu P, Wu X, Zhao X, Chen F, Cheng X, Zhu H, Wang O, Xu M. AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline. Front Genet 2024; 15:1421565. [PMID: 39130747 PMCID: PMC11310137 DOI: 10.3389/fgene.2024.1421565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024] Open
Abstract
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
Collapse
Affiliation(s)
- Chao Liu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Pei Wu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Xue Wu
- BGI Research, Shenzhen, China
| | | | | | | | - Hongmei Zhu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Ou Wang
- BGI Research, Shenzhen, China
| | - Mengyang Xu
- BGI Research, Shenzhen, China
- BGI Research, Qingdao, China
| |
Collapse
|
35
|
Junjun R, Zhengqian Z, Ying W, Jialiang W, Yongzhuang L. A comprehensive review of deep learning-based variant calling methods. Brief Funct Genomics 2024; 23:303-313. [PMID: 38366908 DOI: 10.1093/bfgp/elae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/14/2024] [Accepted: 01/18/2023] [Indexed: 02/18/2024] Open
Abstract
Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning-based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning-based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.
Collapse
Affiliation(s)
- Ren Junjun
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Zhang Zhengqian
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wu Ying
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Wang Jialiang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| | - Liu Yongzhuang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin 150001, China
| |
Collapse
|
36
|
Zhang Z, Liu Y, Li X, Liu Y, Wang Y, Jiang T. HapKled: a haplotype-aware structural variant calling approach for Oxford nanopore sequencing data. Front Genet 2024; 15:1435087. [PMID: 39045321 PMCID: PMC11263161 DOI: 10.3389/fgene.2024.1435087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 06/13/2024] [Indexed: 07/25/2024] Open
Abstract
Introduction: Structural Variants (SVs) are a type of variation that can significantly influence phenotypes and cause diseases. Thus, the accurate detection of SVs is a vital part of modern genetic analysis. The advent of long-read sequencing technology ushers in a new era of more accurate and comprehensive SV calling, and many tools have been developed to call SVs using long-read data. Haplotype-tagging is a procedure that can tag haplotype information on reads and can thus potentially improve the SV detection; nevertheless, few methods make use of this information. In this article, we introduce HapKled, a new SV detection tool that can accurately detect SVs from Oxford Nanopore Technologies (ONT) long-read alignment data. Methods: HapKled utilizes haplotype information underlying alignment data by conducting haplotype-tagging using Whatshap on the reads to improve the detection performance, with three unique calling mechanics including altering clustering conditions according to haplotype information of signatures, determination of similar SVs based on haplotype information, and slack filtering conditions based on haplotype quality. Results: In our evaluations, HapKled outperformed state-of-the-art tools and can deliver better SV detection results on both simulated and real sequencing data. The code and experiments of HapKled can be obtained from https://github.com/CoREse/HapKled. Discussion: With the superb SV detection performance that HapKled can deliver, HapKled could be useful in bioinformatics research, clinical diagnosis, and medical research and development.
Collapse
Affiliation(s)
- Zhendong Zhang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yue Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Xin Li
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Yadong Liu
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, China
| | - Yadong Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, China
| | - Tao Jiang
- Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China
- Zhengzhou Research Institute, Harbin Institute of Technology, Zhengzhou, Henan, China
| |
Collapse
|
37
|
Ji Y, Zhao J, Gong J, Sedlazeck FJ, Fan S. Unveiling novel genetic variants in 370 challenging medically relevant genes using the long read sequencing data of 41 samples from 19 global populations. Mol Genet Genomics 2024; 299:65. [PMID: 38972030 PMCID: PMC11955097 DOI: 10.1007/s00438-024-02158-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 06/16/2024] [Indexed: 07/08/2024]
Abstract
BACKGROUND A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.
Collapse
Affiliation(s)
- Yanfeng Ji
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Junfan Zhao
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Jiao Gong
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Shaohua Fan
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Zhangjiang Fudan International Innovation Center, School of Life Science, Fudan University, Shanghai, 200438, China.
| |
Collapse
|
38
|
Fu Y, Aganezov S, Mahmoud M, Beaulaurier J, Juul S, Treangen TJ, Sedlazeck FJ. MethPhaser: methylation-based long-read haplotype phasing of human genomes. Nat Commun 2024; 15:5327. [PMID: 38909018 PMCID: PMC11193733 DOI: 10.1038/s41467-024-49588-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 06/11/2024] [Indexed: 06/24/2024] Open
Abstract
The assignment of variants across haplotypes, phasing, is crucial for predicting the consequences, interaction, and inheritance of mutations and is a key step in improving our understanding of phenotype and disease. However, phasing is limited by read length and stretches of homozygosity along the genome. To overcome this limitation, we designed MethPhaser, a method that utilizes methylation signals from Oxford Nanopore Technologies to extend Single Nucleotide Variation (SNV)-based phasing. We demonstrate that haplotype-specific methylations extensively exist in Human genomes and the advent of long-read technologies enabled direct report of methylation signals. For ONT R9 and R10 cell line data, we increase the phase length N50 by 78%-151% at a phasing accuracy of 83.4-98.7% To assess the impact of tissue purity and random methylation signals due to inactivation, we also applied MethPhaser on blood samples from 4 patients, still showing improvements over SNV-only phasing. MethPhaser further improves phasing across HLA and multiple other medically relevant genes, improving our understanding of how mutations interact across multiple phenotypes. The concept of MethPhaser can also be extended to non-human diploid genomes. MethPhaser is available at https://github.com/treangenlab/methphaser .
Collapse
Affiliation(s)
- Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| | - Fritz J Sedlazeck
- Department of Computer Science, Rice University, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
39
|
Cumlin T, Karlsson I, Haars J, Rosengren M, Lennerstrand J, Pimushyna M, Feuk L, Ladenvall C, Kaden R. From SARS-CoV-2 to Global Preparedness: A Graphical Interface for Standardised High-Throughput Bioinformatics Analysis in Pandemic Scenarios and Surveillance of Drug Resistance. Int J Mol Sci 2024; 25:6645. [PMID: 38928350 PMCID: PMC11204113 DOI: 10.3390/ijms25126645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 06/04/2024] [Accepted: 06/15/2024] [Indexed: 06/28/2024] Open
Abstract
The COVID-19 pandemic highlighted the need for a rapid, convenient, and scalable diagnostic method for detecting a novel pathogen amidst a global pandemic. While command-line interface tools offer automation for SARS-CoV-2 Oxford Nanopore Technology sequencing data analysis, they are inapplicable to users with limited programming skills. A solution is to establish such automated workflows within a graphical user interface software. We developed two workflows in the software Geneious Prime 2022.1.1, adapted for data obtained from the Midnight and Artic's nCoV-2019 sequencing protocols. Both workflows perform trimming, read mapping, consensus generation, and annotation on SARS-CoV-2 Nanopore sequencing data. Additionally, one workflow includes phylogenetic assignment using the bioinformatic tools pangolin and Nextclade as plugins. The basic workflow was validated in 2020, adhering to the requirements of the European Centre for Disease Prevention and Control for SARS-CoV-2 sequencing and analysis. The enhanced workflow, providing phylogenetic assignment, underwent validation at Uppsala University Hospital by analysing 96 clinical samples. It provided accurate diagnoses matching the original results of the basic workflow while also reducing manual clicks and analysis time. These bioinformatic workflows streamline SARS-CoV-2 Nanopore data analysis in Geneious Prime, saving time and manual work for operators lacking programming knowledge.
Collapse
Affiliation(s)
- Tomas Cumlin
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Ida Karlsson
- Clinical Genomics Uppsala, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden
| | - Jonathan Haars
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Maria Rosengren
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Johan Lennerstrand
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Maryna Pimushyna
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
| | - Lars Feuk
- National Genomics Infrastructure Uppsala, Uppsala University, 751 08 Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 08 Uppsala, Sweden
| | - Claes Ladenvall
- Clinical Genomics Uppsala, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden
- Department of Immunology, Genetics and Pathology, Uppsala University, 751 08 Uppsala, Sweden
| | - Rene Kaden
- Department of Medical Sciences, Section for Clinical Microbiology, Uppsala University, Akademiska Sjukhuset Entrance 40, 751 85 Uppsala, Sweden
- Clinical Genomics Uppsala, Science for Life Laboratory, Uppsala University, 751 85 Uppsala, Sweden
| |
Collapse
|
40
|
Pan C, Reinert K. Leaf: an ultrafast filter for population-scale long-read SV detection. Genome Biol 2024; 25:155. [PMID: 38872200 PMCID: PMC11170821 DOI: 10.1186/s13059-024-03297-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Advances in sequencing technology have facilitated population-scale long-read structural variant (SV) detection. Arguably, one of the main challenges in population-scale analysis is developing effective computational pipelines. Here, we present a new filter-based pipeline for population-scale long-read SV detection. It better captures SV signals at an early stage than conventional assembly-based or alignment-based pipelines. Assessments in this work suggest that the filter-based pipeline helps better resolve intra-read rearrangements. Moreover, it is also more computationally efficient than conventional pipelines and thus may facilitate population-scale long-read applications.
Collapse
Affiliation(s)
- Chenxu Pan
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany.
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| |
Collapse
|
41
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 PMCID: PMC11955098 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
42
|
Sharei M, Kamal M, Afzali-Kusha A, Pedram M. GEMA: A Genome Exact Mapping Accelerator Based on Learned Indexes. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2024; 18:523-538. [PMID: 38157470 DOI: 10.1109/tbcas.2023.3348152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
In this article, we introduce GEMA, a genome exact mapping accelerator based on learned indexes, specifically designed for FPGA implementation. GEMA utilizes a machine learning (ML) algorithm to precisely locate the exact position of read sequences within the original sequence. To enhance the accuracy of the trained ML model, we incorporate data augmentation and data-distribution-aware partitioning techniques. Additionally, we present an efficient yet low-overhead error recovery technique. To map long reads more efficiently, we propose a speculative prefetching approach, which reduces the required memory bandwidth. Furthermore, we suggest an FPGA-based architecture for implementing the proposed mapping accelerator, optimizing the accesses to off-chip memory. Our studies demonstrate that GEMA achieves up to 1.36 × higher speed for short reads compared to the corresponding results reported in recently published exact mapping accelerators. Moreover, GEMA achieves up to ∼22 × faster mapping of long reads compared to the available results for the longest mapped reads using these accelerators.
Collapse
|
43
|
Inamo J, Suzuki A, Ueda MT, Yamaguchi K, Nishida H, Suzuki K, Kaneko Y, Takeuchi T, Hatano H, Ishigaki K, Ishihama Y, Yamamoto K, Kochi Y. Long-read sequencing for 29 immune cell subsets reveals disease-linked isoforms. Nat Commun 2024; 15:4285. [PMID: 38806455 PMCID: PMC11133395 DOI: 10.1038/s41467-024-48615-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 05/02/2024] [Indexed: 05/30/2024] Open
Abstract
Alternative splicing events are a major causal mechanism for complex traits, but they have been understudied due to the limitation of short-read sequencing. Here, we generate a full-length isoform annotation of human immune cells from an individual by long-read sequencing for 29 cell subsets. This contains a number of unannotated transcripts and isoforms such as a read-through transcript of TOMM40-APOE in the Alzheimer's disease locus. We profile characteristics of isoforms and show that repetitive elements significantly explain the diversity of unannotated isoforms, providing insight into the human genome evolution. In addition, some of the isoforms are expressed in a cell-type specific manner, whose alternative 3'-UTRs usage contributes to their specificity. Further, we identify disease-associated isoforms by isoform switch analysis and by integration of several quantitative trait loci analyses with genome-wide association study data. Our findings will promote the elucidation of the mechanism of complex diseases via alternative splicing.
Collapse
Affiliation(s)
- Jun Inamo
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Mahoko Takahashi Ueda
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| | - Kensuke Yamaguchi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
- Biomedical Engineering Research Innovation Center, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| | - Hiroshi Nishida
- Department of Molecular Systems Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan
| | - Katsuya Suzuki
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Yuko Kaneko
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Tsutomu Takeuchi
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
- Saitama Medical University, 38 Morohongo, Moroyama, Iruma, Saitama, 350-0495, Japan
| | - Hiroaki Hatano
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Yasushi Ishihama
- Department of Molecular Systems Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan
- Laboratory of Proteomics for Drug Discovery, National Institute of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, 567-0085, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Yuta Kochi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.
| |
Collapse
|
44
|
Schrauwen I, Rajendran Y, Acharya A, Öhman S, Arvio M, Paetau R, Siren A, Avela K, Granvik J, Leal SM, Määttä T, Kokkonen H, Järvelä I. Optical genome mapping unveils hidden structural variants in neurodevelopmental disorders. Sci Rep 2024; 14:11239. [PMID: 38755281 PMCID: PMC11099145 DOI: 10.1038/s41598-024-62009-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
While short-read sequencing currently dominates genetic research and diagnostics, it frequently falls short of capturing certain structural variants (SVs), which are often implicated in the etiology of neurodevelopmental disorders (NDDs). Optical genome mapping (OGM) is an innovative technique capable of capturing SVs that are undetectable or challenging-to-detect via short-read methods. This study aimed to investigate NDDs using OGM, specifically focusing on cases that remained unsolved after standard exome sequencing. OGM was performed in 47 families using ultra-high molecular weight DNA. Single-molecule maps were assembled de novo, followed by SV and copy number variant calling. We identified 7 variants of interest, of which 5 (10.6%) were classified as likely pathogenic or pathogenic, located in BCL11A, OPHN1, PHF8, SON, and NFIA. We also identified an inversion disrupting NAALADL2, a gene which previously was found to harbor complex rearrangements in two NDD cases. Variants in known NDD genes or candidate variants of interest missed by exome sequencing mainly consisted of larger insertions (> 1kbp), inversions, and deletions/duplications of a low number of exons (1-4 exons). In conclusion, in addition to improving molecular diagnosis in NDDs, this technique may also reveal novel NDD genes which may harbor complex SVs often missed by standard sequencing techniques.
Collapse
Affiliation(s)
- Isabelle Schrauwen
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA.
| | - Yasmin Rajendran
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | - Anushree Acharya
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
| | | | - Maria Arvio
- Päijät-Häme Wellbeing Services, Neurology, Lahti, Finland
| | - Ritva Paetau
- Department of Child Neurology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Auli Siren
- Kanta-Häme Central Hospital, Hämeenlinna, Finland
| | - Kristiina Avela
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Johanna Granvik
- The Wellbeing Services County of Ostrobothnia, Kokkola, Finland
| | - Suzanne M Leal
- Department of Neurology, Center for Statistical Genetics, Gertrude H. Sergievsky Center, Columbia University Medical Center, Columbia University, 630 W 168Th St, New York, NY, 10032, USA
- Taub Institute for Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, NY, USA
| | - Tuomo Määttä
- The Wellbeing Services County of Kainuu, Kajaani, Finland
| | - Hannaleena Kokkonen
- Northern Finland Laboratory Centre NordLab and Medical Research Centre, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Irma Järvelä
- Department of Medical Genetics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
45
|
Tunjić-Cvitanić M, García-Souto D, Pasantes JJ, Šatović-Vukšić E. Dominance of transposable element-related satDNAs results in great complexity of "satDNA library" and invokes the extension towards "repetitive DNA library". MARINE LIFE SCIENCE & TECHNOLOGY 2024; 6:236-251. [PMID: 38827134 PMCID: PMC11136912 DOI: 10.1007/s42995-024-00218-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 02/26/2024] [Indexed: 06/04/2024]
Abstract
Research on bivalves is fast-growing, including genome-wide analyses and genome sequencing. Several characteristics qualify oysters as a valuable model to explore repetitive DNA sequences and their genome organization. Here we characterize the satellitomes of five species in the family Ostreidae (Crassostrea angulata, C. virginica, C. hongkongensis, C. ariakensis, Ostrea edulis), revealing a substantial number of satellite DNAs (satDNAs) per genome (ranging between 33 and 61) and peculiarities in the composition of their satellitomes. Numerous satDNAs were either associated to or derived from transposable elements, displaying a scarcity of transposable element-unrelated satDNAs in these genomes. Due to the non-conventional satellitome constitution and dominance of Helitron-associated satDNAs, comparative satellitomics demanded more in-depth analyses than standardly employed. Comparative analyses (including C. gigas, the first bivalve species with a defined satellitome) revealed that 13 satDNAs occur in all six oyster genomes, with Cg170/HindIII satDNA being the most abundant in all of them. Evaluating the "satDNA library model" highlighted the necessity to adjust this term when studying tandem repeat evolution in organisms with such satellitomes. When repetitive sequences with potential variation in the organizational form and repeat-type affiliation are examined across related species, the introduction of the terms "TE library" and "repetitive DNA library" becomes essential. Supplementary Information The online version contains supplementary material available at 10.1007/s42995-024-00218-0.
Collapse
Affiliation(s)
| | - Daniel García-Souto
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, 15706 Santiago de Compostela, Spain
| | - Juan J. Pasantes
- Centro de Investigación Mariña, Dpto de Bioquímica, Xenética e Inmunoloxía, Universidade de Vigo, 36310 Vigo, Spain
| | - Eva Šatović-Vukšić
- Division of Molecular Biology, Ruđer Bošković Institute, 10000 Zagreb, Croatia
| |
Collapse
|
46
|
Miano-Burkhardt A, Alvarez Jerez P, Daida K, Bandres Ciga S, Billingsley KJ. The Role of Structural Variants in the Genetic Architecture of Parkinson's Disease. Int J Mol Sci 2024; 25:4801. [PMID: 38732020 PMCID: PMC11084710 DOI: 10.3390/ijms25094801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 04/17/2024] [Accepted: 04/22/2024] [Indexed: 05/13/2024] Open
Abstract
Parkinson's disease (PD) significantly impacts millions of individuals worldwide. Although our understanding of the genetic foundations of PD has advanced, a substantial portion of the genetic variation contributing to disease risk remains unknown. Current PD genetic studies have primarily focused on one form of genetic variation, single nucleotide variants (SNVs), while other important forms of genetic variation, such as structural variants (SVs), are mostly ignored due to the complexity of detecting these variants with traditional sequencing methods. Yet, these forms of genetic variation play crucial roles in gene expression and regulation in the human brain and are causative of numerous neurological disorders, including forms of PD. This review aims to provide a comprehensive overview of our current understanding of the involvement of coding and noncoding SVs in the genetic architecture of PD.
Collapse
Affiliation(s)
- Abigail Miano-Burkhardt
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Kensuke Daida
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Sara Bandres Ciga
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| | - Kimberley J. Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, Bethesda, MD 20892, USA; (A.M.-B.); (K.D.)
- Center for Alzheimer’s and Related Dementias, National Institute on Aging, Bethesda, MD 20892, USA; (P.A.J.); (S.B.C.)
| |
Collapse
|
47
|
Petraccioli A, Maio N, Carotenuto R, Odierna G, Guarino FM. The Satellite DNA PcH-Sat, Isolated and Characterized in the Limpet Patella caerulea (Mollusca, Gastropoda), Suggests the Origin from a Nin-SINE Transposable Element. Genes (Basel) 2024; 15:541. [PMID: 38790169 PMCID: PMC11121367 DOI: 10.3390/genes15050541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 04/16/2024] [Accepted: 04/23/2024] [Indexed: 05/26/2024] Open
Abstract
Satellite DNA (sat-DNA) was previously described as junk and selfish DNA in the cellular economy, without a clear functional role. However, during the last two decades, evidence has been accumulated about the roles of sat-DNA in different cellular functions and its probable involvement in tumorigenesis and adaptation to environmental changes. In molluscs, studies on sat-DNAs have been performed mainly on bivalve species, especially those of economic interest. Conversely, in Gastropoda (which includes about 80% of the currently described molluscs species), studies on sat-DNA have been largely neglected. In this study, we isolated and characterized a sat-DNA, here named PcH-sat, in the limpet Patella caerulea using the restriction enzyme method, particularly HaeIII. Monomeric units of PcH-sat are 179 bp long, AT-rich (58.7%), and with an identity among monomers ranging from 91.6 to 99.8%. Southern blot showed that PcH-sat is conserved in P. depressa and P. ulyssiponensis, while a smeared signal of hybridization was present in the other three investigated limpets (P. ferruginea, P. rustica and P. vulgata). Dot blot showed that PcH-sat represents about 10% of the genome of P. caerulea, 5% of that of P. depressa, and 0.3% of that of P. ulyssiponensis. FISH showed that PcH-sat was mainly localized on pericentromeric regions of chromosome pairs 2 and 4-7 of P. caerulea (2n = 18). A database search showed that PcH-sat contains a large segment (of 118 bp) showing high identity with a homologous trait of the Nin-SINE transposable element (TE) of the patellogastropod Lottia gigantea, supporting the hypothesis that TEs are involved in the rising and tandemization processes of sat-DNAs.
Collapse
Affiliation(s)
| | | | | | - Gaetano Odierna
- Department of Biology, University of Naples Federico II, Via Cinthia, I-80126 Naples, Italy; (A.P.); (N.M.); (R.C.); (F.M.G.)
| | | |
Collapse
|
48
|
Buthasane W, Shotelersuk V, Chetruengchai W, Srichomthong C, Assawapitaksakul A, Tangphatsornruang S, Pootakham W, Sonthirod C, Tongsima S, Wangkumhang P, Wilantho A, Thongphakdee A, Sanannu S, Poksawat C, Nipanunt T, Kasorndorkbua C, Koepfli KP, Pukazhenthi BS, Suriyaphol P, Wongsurawat T, Jenjaroenpun P, Suriyaphol G. Comprehensive genome assembly reveals genetic diversity and carcass consumption insights in critically endangered Asian king vultures. Sci Rep 2024; 14:9455. [PMID: 38658744 PMCID: PMC11043450 DOI: 10.1038/s41598-024-59990-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 04/17/2024] [Indexed: 04/26/2024] Open
Abstract
The Asian king vulture (AKV), a vital forest scavenger, is facing globally critical endangerment. This study aimed to construct a reference genome to unveil the mechanisms underlying its scavenger abilities and to assess the genetic relatedness of the captive population in Thailand. A reference genome of a female AKV was assembled from sequencing reads obtained from both PacBio long-read and MGI short-read sequencing platforms. Comparative genomics with New World vultures (NWVs) and other birds in the Family Accipitridae revealed unique gene families in AKV associated with retroviral genome integration and feather keratin, contrasting with NWVs' genes related to olfactory reception. Expanded gene families in AKV were linked to inflammatory response, iron regulation and spermatogenesis. Positively selected genes included those associated with anti-apoptosis, immune response and muscle cell development, shedding light on adaptations for carcass consumption and high-altitude soaring. Using restriction site-associated DNA sequencing (RADseq)-based genome-wide single nucleotide polymorphisms (SNPs), genetic relatedness and inbreeding status of five captive AKVs were determined, revealing high genomic inbreeding in two females. In conclusion, the AKV reference genome was established, providing insights into its unique characteristics. Additionally, the potential of RADseq-based genome-wide SNPs for selecting AKV breeders was demonstrated.
Collapse
Affiliation(s)
- Wannapol Buthasane
- Biochemistry Unit, Department of Physiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Vorasuk Shotelersuk
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Wanna Chetruengchai
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Chalurmpon Srichomthong
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Adjima Assawapitaksakul
- Center of Excellence for Medical Genomics, Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Henri Dunant Road, Pathumwan, Bangkok, 10330, Thailand
- Excellence Center for Genomics and Precision Medicine, King Chulalongkorn Memorial Hospital, The Thai Red Cross Society, Bangkok, 10330, Thailand
| | - Sithichoke Tangphatsornruang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Wirulda Pootakham
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Chutima Sonthirod
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Pongsakorn Wangkumhang
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Alisa Wilantho
- National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani, 12120, Thailand
| | - Ampika Thongphakdee
- Animal Conservation and Research Institute, The Zoological Park Organization of Thailand under the Royal Patronage of H.M. The King, Bangkok, 10300, Thailand
| | - Saowaphang Sanannu
- Animal Conservation and Research Institute, The Zoological Park Organization of Thailand under the Royal Patronage of H.M. The King, Bangkok, 10300, Thailand
| | - Chaianan Poksawat
- Animal Conservation and Research Institute, The Zoological Park Organization of Thailand under the Royal Patronage of H.M. The King, Bangkok, 10300, Thailand
| | - Tarasak Nipanunt
- Huai Kha Khaeng Wildlife Breeding Center, Department of National Parks, Wildlife and Plant Conservation, Uthai Thani, 61160, Thailand
| | - Chaiyan Kasorndorkbua
- Laboratory of Raptor Research and Conservation Medicine, Department of Pathology, Faculty of Veterinary Medicine, Kasetsart University, Bangkok, 10900, Thailand
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA, 22630, USA
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA, 22630, USA
| | - Budhan S Pukazhenthi
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA, 22630, USA
| | - Prapat Suriyaphol
- Division of Medical Bioinformatics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Thidathip Wongsurawat
- Division of Medical Bioinformatics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Piroon Jenjaroenpun
- Division of Medical Bioinformatics, Department of Research and Development, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, 10700, Thailand
| | - Gunnaporn Suriyaphol
- Biochemistry Unit, Department of Physiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand.
| |
Collapse
|
49
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
50
|
Ángeles-Argáiz RE, Aguirre-Beltrán LFL, Hernández-Oaxaca D, Quintero-Corrales C, Trujillo-Roldán MA, Castillo-Ramírez S, Garibay-Orijel R. Assembly collapsing versus heterozygosity oversizing: detection of homokaryotic and heterokaryotic Laccaria trichodermophora strains by hybrid genome assembly. Microb Genom 2024; 10:001218. [PMID: 38529901 PMCID: PMC10995626 DOI: 10.1099/mgen.0.001218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 03/01/2024] [Indexed: 03/27/2024] Open
Abstract
Genome assembly and annotation using short-paired reads is challenging for eukaryotic organisms due to their large size, variable ploidy and large number of repetitive elements. However, the use of single-molecule long reads improves assembly quality (completeness and contiguity), but haplotype duplications still pose assembly challenges. To address the effect of read length on genome assembly quality, gene prediction and annotation, we compared genome assemblers and sequencing technologies with four strains of the ectomycorrhizal fungus Laccaria trichodermophora. By analysing the predicted repertoire of carbohydrate enzymes, we investigated the effects of assembly quality on functional inferences. Libraries were generated using three different sequencing platforms (Illumina Next-Seq, Mi-Seq and PacBio Sequel), and genomes were assembled using single and hybrid assemblies/libraries. Long reads or hybrid assemby resolved the collapsing of repeated regions, but the nuclear heterozygous versions remained unresolved. In dikaryotic fungi, each cell includes two nuclei and each nucleus has differences not only in allelic gene version but also in gene composition and synteny. These heterokaryotic cells produce fragmentation and size overestimation of the genome assembly of each nucleus. Hybrid assembly revealed a wider functional diversity of genomes. Here, several predicted oxidizing activities on glycosyl residues of oligosaccharides and several chitooligosaccharide acetylase activities would have passed unnoticed in short-read assemblies. Also, the size and fragmentation of the genome assembly, in combination with heterozygosity analysis, allowed us to distinguish homokaryotic and heterokaryotic strains isolated from L. trichodermophora fruit bodies.
Collapse
Affiliation(s)
- Rodolfo Enrique Ángeles-Argáiz
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Circuito de los Posgrados s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Red de Manejo Biotecnológico de Recursos, Instituto de Ecología A. C. Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz, México, C.P. 91612, Mexico
| | - Luis Fernando Lozano Aguirre-Beltrán
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
| | - Diana Hernández-Oaxaca
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
- Red de Biodiversidad y Sistemática, Instituto de Ecología A. C. Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz, México, C.P. 91073, Mexico
| | - Christian Quintero-Corrales
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Circuito de los Posgrados s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
| | - Mauricio A. Trujillo-Roldán
- Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México, Km 107 carretera Tijuana-Ensenada, Ensenada, Baja California, Mexico, C.P. 22860, Mexico
| | - Santiago Castillo-Ramírez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
| | - Roberto Garibay-Orijel
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
| |
Collapse
|