1
|
Lu B, Guo Z, Liu X, Ni Y, Xu L, Huang J, Li T, Feng T, Li R, Deng X. Comprehensive comparison of the third-generation sequencing tools for bacterial 6mA profiling. Nat Commun 2025; 16:3982. [PMID: 40295502 PMCID: PMC12037826 DOI: 10.1038/s41467-025-59187-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 04/11/2025] [Indexed: 04/30/2025] Open
Abstract
DNA N6-methyladenine (6mA) serves as an intrinsic and principal epigenetic marker in prokaryotes, impacting various biological processes. To date, limited advanced sequencing technologies and analyzing tools are available for bacterial DNA 6mA. Here, we evaluate eight tools designed for the 6mA identification or de novo methylation detection. This assessment includes Nanopore (R9 and R10), Single-Molecule Real-Time (SMRT) Sequencing, and cross-reference with 6mA-IP-seq and DR-6mA-seq. Our multi-dimensional evaluation report encompasses motif discovery, site-level accuracy, single-molecule accuracy, and outlier detection across six bacteria strains. While most tools correctly identify motifs, their performance varies at single-base resolution, with SMRT and Dorado consistently delivering strong performance. Our study indicates that existing tools cannot accurately detect low-abundance methylation sites. Additionally, we introduce an optimized method for advancing 6mA prediction, which substantially improves the detection performance of Dorado. Overall, our study provides a robust and detailed examination of computational tools for bacterial 6mA profiling, highlighting insights for further tool enhancement and epigenetic research.
Collapse
Grants
- Shenzhen Science and Technology Fund, JCYJ20210324134000002, recipient: Xin Deng Guangdong Major Project of Basic and Applied Basic Research, 2020B0301030005, recipient: Xin Deng National Natural Science Foundation of China, 32172358, recipient: Xin Deng General Research Funds of Hong Kong, 11103221, recipient: Xin Deng General Research Funds of Hong Kong, 11102223, recipient: Xin Deng General Research Funds of Hong Kong, 11101722, recipient: Xin Deng
Collapse
Affiliation(s)
- Beifang Lu
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
| | - Zhihao Guo
- Department of Infectious Diseases and Public Health, City University of Hong Kong, Hong Kong SAR, China
| | - Xudong Liu
- Department of Infectious Diseases and Public Health, City University of Hong Kong, Hong Kong SAR, China
| | - Ying Ni
- Department of Infectious Diseases and Public Health, City University of Hong Kong, Hong Kong SAR, China
| | - Letong Xu
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
| | - Jiadai Huang
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
| | - Tianmin Li
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
| | - Tongtong Feng
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
| | - Runsheng Li
- Department of Infectious Diseases and Public Health, City University of Hong Kong, Hong Kong SAR, China.
- Tung Biomedical Sciences Center, City University of Hong Kong, Hong Kong, China.
| | - Xin Deng
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China.
- Tung Biomedical Sciences Center, City University of Hong Kong, Hong Kong, China.
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, Guangdong, China.
| |
Collapse
|
2
|
Hegedüs B, Sahu N, Bálint B, Haridas S, Bense V, Merényi Z, Virágh M, Wu H, Liu XB, Riley R, Lipzen A, Koriabine M, Savage E, Guo J, Barry K, Ng V, Urbán P, Gyenesei A, Freitag M, Grigoriev IV, Nagy LG. Morphogenesis, starvation, and light responses in a mushroom-forming fungus revealed by long-read sequencing and extensive expression profiling. CELL GENOMICS 2025:100853. [PMID: 40262612 DOI: 10.1016/j.xgen.2025.100853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 12/19/2024] [Accepted: 03/24/2025] [Indexed: 04/24/2025]
Abstract
Mushroom-forming fungi (Agaricomycetes) are emerging as pivotal players in several fields of science and industry. Genomic data for Agaricomycetes are accumulating rapidly; however, this is not paralleled by improvements of gene annotations, which leave gene function notoriously poorly understood. We set out to improve our functional understanding of the model mushroom Coprinopsis cinerea by integrating a new, chromosome-level assembly, high-quality gene predictions, and functional information derived from broad gene-expression profiling data. The new annotation includes 5' and 3' untranslated regions (UTRs), polyadenylation sites (PASs), upstream open reading frames (uORFs), splicing isoforms, and microexons, as well as core gene sets corresponding to carbon starvation, light response, and hyphal differentiation. As a result, the genome of C. cinerea has now become the most comprehensively annotated genome among mushroom-forming fungi, which will contribute to multiple rapidly expanding fields, including research on their life history, light and stress responses, as well as multicellular development.
Collapse
Affiliation(s)
- Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Neha Sahu
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Sajeet Haridas
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Viktória Bense
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Máté Virágh
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Hongli Wu
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Xiao-Bin Liu
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary
| | - Robert Riley
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anna Lipzen
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Maxim Koriabine
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Emily Savage
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jie Guo
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kerrie Barry
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Vivian Ng
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Péter Urbán
- János Szentágothai Research Center, University of Pécs, Ifjúság útja 20, 7624 Pécs, Hungary
| | - Attila Gyenesei
- János Szentágothai Research Center, University of Pécs, Ifjúság útja 20, 7624 Pécs, Hungary
| | - Michael Freitag
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR 97331, USA
| | - Igor V Grigoriev
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, HUN-REN Biological Research Center, Temesvári krt. 62, 6726 Szeged, Hungary.
| |
Collapse
|
3
|
Dakal TC, Xu C, Kumar A. Advanced computational tools, artificial intelligence and machine-learning approaches in gut microbiota and biomarker identification. FRONTIERS IN MEDICAL TECHNOLOGY 2025; 6:1434799. [PMID: 40303946 PMCID: PMC12037385 DOI: 10.3389/fmedt.2024.1434799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 10/16/2024] [Indexed: 05/02/2025] Open
Abstract
The microbiome of the gut is a complex ecosystem that contains a wide variety of microbial species and functional capabilities. The microbiome has a significant impact on health and disease by affecting endocrinology, physiology, and neurology. It can change the progression of certain diseases and enhance treatment responses and tolerance. The gut microbiota plays a pivotal role in human health, influencing a wide range of physiological processes. Recent advances in computational tools and artificial intelligence (AI) have revolutionized the study of gut microbiota, enabling the identification of biomarkers that are critical for diagnosing and treating various diseases. This review hunts through the cutting-edge computational methodologies that integrate multi-omics data-such as metagenomics, metaproteomics, and metabolomics-providing a comprehensive understanding of the gut microbiome's composition and function. Additionally, machine learning (ML) approaches, including deep learning and network-based methods, are explored for their ability to uncover complex patterns within microbiome data, offering unprecedented insights into microbial interactions and their link to host health. By highlighting the synergy between traditional bioinformatics tools and advanced AI techniques, this review underscores the potential of these approaches in enhancing biomarker discovery and developing personalized therapeutic strategies. The convergence of computational advancements and microbiome research marks a significant step forward in precision medicine, paving the way for novel diagnostics and treatments tailored to individual microbiome profiles. Investigators have the ability to discover connections between the composition of microorganisms, the expression of genes, and the profiles of metabolites. Individual reactions to medicines that target gut microbes can be predicted by models driven by artificial intelligence. It is possible to obtain personalized and precision medicine by first gaining an understanding of the impact that the gut microbiota has on the development of disease. The application of machine learning allows for the customization of treatments to the specific microbial environment of an individual.
Collapse
Affiliation(s)
- Tikam Chand Dakal
- Genome and Computational Biology Lab, Department of Biotechnology, Mohanlal Sukhadia University, Udaipur, India
| | - Caiming Xu
- Beckman Research Institute of City of Hope, Monrovia, CA, United States
- Department of General Surgery, The First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Abhishek Kumar
- Manipal Academy of Higher Education (MAHE), Manipal, India
- Institute of Bioinformatics, International Technology Park, Bangalore, India
| |
Collapse
|
4
|
Monzó C, Frankish A, Conesa A. Notable challenges posed by long-read sequencing for the study of transcriptional diversity and genome annotation. Genome Res 2025; 35:583-592. [PMID: 40032585 PMCID: PMC12047247 DOI: 10.1101/gr.279865.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 01/30/2025] [Indexed: 03/05/2025]
Abstract
Long-read sequencing (LRS) technologies have revolutionized transcriptomic research by enabling the comprehensive sequencing of full-length transcripts. Using these technologies, researchers have reported tens of thousands of novel transcripts, even in well-annotated genomes, while developing new algorithms and experimental approaches to handle the noisy data. The Long-read RNA-seq Genome Annotation Assessment Project community effort benchmarked LRS methods in transcriptomics and validated many novel, lowly expressed, often times sample-specific transcripts identified by long reads. These molecules represent deviations of the major transcriptional program that were overlooked by short-read sequencing methods but are now captured by the full-length, single-molecule approach. This Perspective discusses the challenges and opportunities associated with LRS' capacity to unravel this fraction of the transcriptome, in terms of both transcriptome biology and genome annotation. For transcriptome biology, we need to develop novel experimental and computational methods to effectively differentiate technology errors from rare but real molecules. For genome annotation, we must agree on the strategy to capture molecular variability while still defining reference annotations that are useful for the genomics community.
Collapse
Affiliation(s)
- Carolina Monzó
- Institute for Integrative Systems Biology (I2SysBio), Spanish National Research Council (CSIC), Paterna 46980, Spain
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Ana Conesa
- Institute for Integrative Systems Biology (I2SysBio), Spanish National Research Council (CSIC), Paterna 46980, Spain;
| |
Collapse
|
5
|
Monzó C, Liu T, Conesa A. Transcriptomics in the era of long-read sequencing. Nat Rev Genet 2025:10.1038/s41576-025-00828-z. [PMID: 40155769 DOI: 10.1038/s41576-025-00828-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2025] [Indexed: 04/01/2025]
Abstract
Transcriptome sequencing revolutionized the analysis of gene expression, providing an unbiased approach to gene detection and quantification that enabled the discovery of novel isoforms, alternative splicing events and fusion transcripts. However, although short-read sequencing technologies have surpassed the limited dynamic range of previous technologies such as microarrays, they have limitations, for example, in resolving full-length transcripts and complex isoforms. Over the past 5 years, long-read sequencing technologies have matured considerably, with improvements in instrumentation and analytical methods, enabling their application to RNA sequencing (RNA-seq). Benchmarking studies are beginning to identify the strengths and limitations of long-read RNA-seq, although there remains a need for comprehensive resources to guide newcomers through the intricacies of this approach. In this Review, we provide a comprehensive overview of the long-read RNA-seq workflow, from library preparation and sequencing challenges to core data processing, downstream analyses and emerging developments. We present an extensive inventory of experimental and analytical methods and discuss current challenges and prospects.
Collapse
Affiliation(s)
- Carolina Monzó
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| |
Collapse
|
6
|
Ament IH, DeBruyne N, Wang F, Lin L. Long-read RNA sequencing: A transformative technology for exploring transcriptome complexity in human diseases. Mol Ther 2025; 33:883-894. [PMID: 39563027 PMCID: PMC11897757 DOI: 10.1016/j.ymthe.2024.11.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 10/30/2024] [Accepted: 11/15/2024] [Indexed: 11/21/2024] Open
Abstract
Long-read RNA sequencing (RNA-seq) is emerging as a powerful and versatile technology for studying human transcriptomes. By enabling the end-to-end sequencing of full-length transcripts, long-read RNA-seq opens up avenues for investigating various RNA species and features that cannot be reliably interrogated by standard short-read RNA-seq methods. In this review, we present an overview of long-read RNA-seq, delineating its strengths over short-read RNA-seq, as well as summarizing recent advances in experimental and computational approaches to boost the power of long-read-based transcriptomics. We describe a wide range of applications of long-read RNA-seq, and highlight its expanding role as a foundational technology for exploring transcriptome variations in human diseases.
Collapse
Affiliation(s)
| | - Nicole DeBruyne
- Graduate Group in Cell and Molecular Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Feng Wang
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| | - Lan Lin
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| |
Collapse
|
7
|
Islam SI, Taweethavonsawat P. Advanced genomic research in understanding fish-borne zoonotic parasitic infection. Microb Pathog 2025; 200:107367. [PMID: 39924092 DOI: 10.1016/j.micpath.2025.107367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 01/31/2025] [Accepted: 02/07/2025] [Indexed: 02/11/2025]
Abstract
Fish-borne zoonotic parasites pose substantial risks to human health and global aquaculture, primarily through raw or undercooked fish consumption. The rapid expansion of aquaculture, increasing global fish trade, and rising human populations have amplified these concerns. Despite widespread awareness of meat-borne zoonoses, fish-borne parasitic infections remain underrecognized, especially in developed countries. Traditional morphological and molecular methods have provided critical foundations for studying these parasites, yet recent genomic advances have revolutionized our understanding of their genetic diversity, biology, and host-pathogen dynamics. This review underscores the significance of integrating genomic approaches with conventional methods to enhance disease surveillance, risk assessment, and control strategies. Harnessing genomic tools will enable the development of effective interventions to mitigate zoonotic parasite impacts, protect human health, and promote sustainable aquaculture. A comprehensive, genomics-driven approach is essential to overcoming the global challenges of fish-borne zoonotic infections.
Collapse
Affiliation(s)
- Sk Injamamul Islam
- Pathobiology Program, Department of Pathology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Piyanan Taweethavonsawat
- Biomarkers in Animal Parasitology Research Unit and Parasitology Unit, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
8
|
Fan J, Ma D, Zhu H, Lin M, Zhong Z, Tian Y. Full-Length Transcriptome Sequencing and Comparative Transcriptomics Reveal the Molecular Mechanisms Underlying Gonadal Development in Sleepy Cod ( Oxyeleotris lineolata). BIOLOGY 2025; 14:232. [PMID: 40136489 PMCID: PMC11940265 DOI: 10.3390/biology14030232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 02/19/2025] [Accepted: 02/22/2025] [Indexed: 03/27/2025]
Abstract
Sleepy cod (Oxyeleotris lineolata) is native to Australia and is now an economically valuable fish cultured in China and Southern Asian countries. Its growth rate exhibits as sexually dimorphic, with males generally growing more rapidly and attaining a larger body size compared to females. Thus, the effective development of sex control breeding can significantly contribute to increased yields and output value. Nevertheless, due to the lack of genomic and transcriptomic data, the molecular mechanisms underlying sex determination and gonadal differentiation in sleepy cod remain poorly understood. In this study, long-read PacBio isoform sequencing (Iso-Seq) was performed to obtain a full-length transcriptome from a pooled sample of eight tissues (kidney, brain, liver, muscle, heart, spleen, ovary and testis). A total of 30.41 G subread bases were generated and 49,113 non-redundant full-length transcripts with an average length of 2948 bp were produced. Using the full-length transcriptome as a reference, short-read Illumina sequencing was performed to investigate the differences in gene expression at the transcriptome level between ovaries and testes from 12-month-old individuals. A total of 19,102 differentially expressed transcripts (DETs) were identified, of which 8510 (44.55%) were up-regulated in the ovary and 10,592 (55.45%) were up-regulated in the testis. The DETs were mainly clustered into 241 KEGG pathways, in which oocyte meiosis and arachidonic acid metabolism were the most relevant pathways involved in gonadal differentiation. To verify the validity of the transcriptomic data, 20 DETs were selected to investigate the gonad expression profiles based on qPCR. The expression levels of all 20 screened genes were consistent with the transcriptome sequencing results. The present study provides new genetic resources-including full-length transcriptome sequences and annotation information-as a coding genomic-level reference for sleepy cod-yielding valuable insights into the genetic mechanisms of sex determination and gonadal differentiation in this economically important species.
Collapse
Affiliation(s)
- Jiajia Fan
- Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China; (J.F.); (H.Z.); (M.L.); (Z.Z.); (Y.T.)
- Key Laboratory of Aquatic Animal Immunology and Sustainable Aquaculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China
| | - Dongmei Ma
- Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China; (J.F.); (H.Z.); (M.L.); (Z.Z.); (Y.T.)
- Key Laboratory of Aquatic Animal Immunology and Sustainable Aquaculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China
| | - Huaping Zhu
- Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China; (J.F.); (H.Z.); (M.L.); (Z.Z.); (Y.T.)
- Key Laboratory of Aquatic Animal Immunology and Sustainable Aquaculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China
| | - Minghui Lin
- Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China; (J.F.); (H.Z.); (M.L.); (Z.Z.); (Y.T.)
- Key Laboratory of Aquatic Animal Immunology and Sustainable Aquaculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China
| | - Zaixuan Zhong
- Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China; (J.F.); (H.Z.); (M.L.); (Z.Z.); (Y.T.)
- Key Laboratory of Aquatic Animal Immunology and Sustainable Aquaculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China
| | - Yuanyuan Tian
- Key Laboratory of Tropical & Subtropical Fishery Resource Application & Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China; (J.F.); (H.Z.); (M.L.); (Z.Z.); (Y.T.)
- Key Laboratory of Aquatic Animal Immunology and Sustainable Aquaculture, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou 510380, China
| |
Collapse
|
9
|
Akzhunis I, Dinara Z, Nurzhaugan D, Aidyn O, Nazerke T, Gulmira T. Unravelling the Chloroplast Genome of the Kazakh Apricot ( Prunus armeniaca L.) Through MinION Long-Read Sequencing. PLANTS (BASEL, SWITZERLAND) 2025; 14:638. [PMID: 40094543 PMCID: PMC11902206 DOI: 10.3390/plants14050638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 02/07/2025] [Accepted: 02/18/2025] [Indexed: 03/19/2025]
Abstract
The study of the genetic diversity and adaptation mechanisms of the Kazakh apricot (Prunus armeniaca L.) is essential for breeding programs and the conservation of plant genetic resources in arid environments. Despite this species' ecological and agricultural significance, its chloroplast genome remains poorly studied due to its complex repetitive structure and secondary metabolites that hinder high-molecular-weight DNA (HMW-DNA) extraction and long-read sequencing. To address this gap, our study aims to develop and optimise sequencing protocols for P. armeniaca under arid conditions using Oxford Nanopore's MinION technology. We successfully extracted HMW-DNA with 50-100 ng/μL concentrations and purity (A260/A280) between 1.8 and 2.0, ensuring high sequencing quality. A total of 10 GB of sequencing data was generated, comprising 155,046 reads, of which 74,733 (48.2%) had a Q-score ≥ 8. The average read length was 1679 bp, with a maximum of 31,144 bp. Chloroplast genome assembly resulted in 33,000 contigs with a total length of 1.1 Gb and a BUSCO completeness score of 97.3%. Functional annotation revealed key genes (nalC, AcrE, and mecC-type BlaZ) associated with stress tolerance and a substantial proportion (≈40%) of hypothetical proteins requiring further investigation. GC content analysis (40.25%) and GC skew data suggest the presence of specific regulatory elements linked to environmental adaptation. This study demonstrates the feasibility of using third-generation sequencing technologies to analyse complex plant genomes and highlights the genetic resilience of P. armeniaca to extreme conditions. The findings provide a foundation for breeding programs to improve drought tolerance and conservation strategies to protect Kazakhstan's unique arid ecosystems.
Collapse
Affiliation(s)
| | | | | | - Orazov Aidyn
- Laboratory of Natural Flora and Dendrology, Mangyshlak Experimental Botanical Garden, Aktau 130000, Kazakhstan; (I.A.); (Z.D.); (D.N.); (T.N.); (T.G.)
| | | | | |
Collapse
|
10
|
Marceca GP, Romano G, Acunzo M, Nigita G. ncRNA Editing: Functional Characterization and Computational Resources. Methods Mol Biol 2025; 2883:455-495. [PMID: 39702721 DOI: 10.1007/978-1-0716-4290-0_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Non-coding RNAs (ncRNAs) play crucial roles in gene expression regulation, translation, and disease development, including cancer. They are classified by size in short and long non-coding RNAs. This chapter focuses on the functional implications of adenosine-to-inosine (A-to-I) RNA editing in both short (e.g., miRNAs) and long ncRNAs. RNA editing dynamically alters the sequence and structure of primary transcripts, impacting ncRNA biogenesis and function. Notable findings include the role of miRNA editing in promoting glioblastoma invasiveness, characterizing RNA editing hotspots across cancers, and its implications in thyroid cancer and ischemia. This chapter also highlights bioinformatics resources and next-generation sequencing (NGS) technologies that enable comprehensive ncRNAome studies and genome-wide RNA editing detection. Dysregulation of RNA editing machinery has been linked to various human diseases, emphasizing the potential of RNA editing as a biomarker and therapeutic target. This overview integrates current knowledge and computational tools for studying ncRNA editing, providing insights into its biological significance and clinical applications.
Collapse
Affiliation(s)
| | - Giulia Romano
- Division of Pulmonary Diseases and Critical Care Medicine, Virginia Commonwealth University, Richmond, VA, USA
| | - Mario Acunzo
- Division of Pulmonary Diseases and Critical Care Medicine, Virginia Commonwealth University, Richmond, VA, USA
| | - Giovanni Nigita
- Department of Cancer Biology and Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, OH, USA.
- Center for RNA Biology, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
11
|
Zhang Z, Wei M, Jia B, Yuan Y. Recent Advances in Antimicrobial Resistance: Insights from Escherichia coli as a Model Organism. Microorganisms 2024; 13:51. [PMID: 39858819 PMCID: PMC11767524 DOI: 10.3390/microorganisms13010051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2024] [Revised: 12/26/2024] [Accepted: 12/28/2024] [Indexed: 01/27/2025] Open
Abstract
Antimicrobial resistance (AMR) represents a critical global health threat, and a thorough understanding of resistance mechanisms in Escherichia coli is needed to guide effective treatment interventions. This review explores recent advances for investigating AMR in E. coli, including machine learning for resistance pattern analysis, laboratory evolution to generate resistant mutants, mutant library construction, and genome sequencing for in-depth characterization. Key resistance mechanisms are discussed, including drug inactivation, target modification, altered transport, and metabolic adaptation. Additionally, we highlight strategies to mitigate the spread of AMR, such as dynamic resistance monitoring, innovative therapies like phage therapy and CRISPR-Cas technology, and tighter regulation of antibiotic use in animal production systems. This review provides actionable insights into E. coli resistance mechanisms and identifies promising directions for future antibiotic development and AMR management.
Collapse
Affiliation(s)
| | | | - Bin Jia
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; (Z.Z.); (M.W.); (Y.Y.)
| | | |
Collapse
|
12
|
Dyshlovoy SA, Paigin S, Afflerbach AK, Lobermeyer A, Werner S, Schüller U, Bokemeyer C, Schuh AH, Bergmann L, von Amsberg G, Joosse SA. Applications of Nanopore sequencing in precision cancer medicine. Int J Cancer 2024; 155:2129-2140. [PMID: 39031959 DOI: 10.1002/ijc.35100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 04/25/2024] [Accepted: 06/25/2024] [Indexed: 07/22/2024]
Abstract
Oxford Nanopore Technologies sequencing, also referred to as Nanopore sequencing, stands at the forefront of a revolution in clinical genetics, offering the potential for rapid, long read, and real-time DNA and RNA sequencing. This technology is currently making sequencing more accessible and affordable. In this comprehensive review, we explore its potential regarding precision cancer diagnostics and treatment. We encompass a critical analysis of clinical cases where Nanopore sequencing was successfully applied to identify point mutations, splice variants, gene fusions, epigenetic modifications, non-coding RNAs, and other pivotal biomarkers that defined subsequent treatment strategies. Additionally, we address the challenges of clinical applications of Nanopore sequencing and discuss the current efforts to overcome them.
Collapse
Affiliation(s)
- Sergey A Dyshlovoy
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Oxford, UK
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefanie Paigin
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Institute of Pathology and Neuropathology, University Hospital Tübingen, Tübingen, Germany
| | - Ann-Kristin Afflerbach
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Annabelle Lobermeyer
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Stefan Werner
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Ulrich Schüller
- Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
- Institute for Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Department of Paediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Carsten Bokemeyer
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Anna H Schuh
- Department of Oncology, Oxford Molecular Diagnostics Centre, University of Oxford, Level 4, John Radcliffe Hospital, Oxford, UK
| | - Lina Bergmann
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Gunhild von Amsberg
- Department of Oncology, Hematology and Bone Marrow Transplantation with Section Pneumology, University Cancer Center Hamburg (UCCH), University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Martini-Klinik, Prostate Cancer Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Simon A Joosse
- Department of Tumor Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Mildred Scheel Cancer Career Center HaTriCS4, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
13
|
Mishin AA, Groth T, Green RE, Troll CJ. Inert splint-driven oligonucleotide assembly. Synth Biol (Oxf) 2024; 9:ysae019. [PMID: 39734808 PMCID: PMC11671690 DOI: 10.1093/synbio/ysae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 11/06/2024] [Accepted: 12/11/2024] [Indexed: 12/31/2024] Open
Abstract
In this study, we introduce a new in vitro method for oligonucleotide fragment assembly. Unlike polymerase chain assembly and ligase chain assembly that rely on short, highly purified oligonucleotides, our method, named Splynthesis, uses a one-tube, splint-driven assembly reaction. Splynthesis connects standard-desalted "contig" oligos (∼150 nt in length) via shorter "splint" oligos harboring 5' and 3' blocking modifications to prevent off-target ligation and amplification events. We demonstrate the Splynthesis method to assemble a 741-bp gene fragment. We verify the assembled polymerase chain reaction product using standard molecular biology techniques, as well as long-read Oxford Nanopore sequencing, and confirm that the product is cloneable via molecular means, as well as Sanger sequencing. This approach is applicable for synthetic biology, directed evolution, functional protein assays, and potentially even splint-based ligase chain reaction assays.
Collapse
Affiliation(s)
- Andrew A Mishin
- Claret Bioscience LLC, 100 Enterprise Way, Suite A102, Scotts Valley, CA 95066, United States
| | - Tobin Groth
- Claret Bioscience LLC, 100 Enterprise Way, Suite A102, Scotts Valley, CA 95066, United States
| | - Richard E Green
- Claret Bioscience LLC, 100 Enterprise Way, Suite A102, Scotts Valley, CA 95066, United States
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, United States
| | - Christopher J Troll
- Claret Bioscience LLC, 100 Enterprise Way, Suite A102, Scotts Valley, CA 95066, United States
| |
Collapse
|
14
|
Maimaiti M, Kong L, Yu Q, Wang Z, Liu Y, Yang C, Guo W, Jin L, Yi J. Analytical Performance of a Novel Nanopore Sequencing for SARS-CoV-2 Genomic Surveillance. J Med Virol 2024; 96:e70108. [PMID: 39639823 PMCID: PMC11621993 DOI: 10.1002/jmv.70108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 10/14/2024] [Accepted: 10/31/2024] [Indexed: 12/07/2024]
Abstract
The genomic analysis of SARS-CoV-2 has served as a crucial tool for generating invaluable data that fulfils both epidemiological and clinical necessities. Long-read sequencing technology (e.g., ONT) has been widely used, providing a real-time and faster response when necessitated. A novel nanopore-based long-read sequencing platform named QNome nanopore has been successfully used for bacterial genome sequencing and assembly; however, its performance in the SARS-CoV-2 genomic surveillance is still lacking. Synthetic SARS-CoV-2 controls and 120 nasopharyngeal swab (NPS) samples that tested positive by real-time polymerase chain reaction were sequenced on both QNome and MGI platforms in parallel. The analytical performance of QNome was compared to the short-read sequencing on MGI. For the synthetic SARS-CoV-2 controls, despite the increased error rates observed in QNome nanopore sequencing reads, accurate consensus-level sequence determination was achieved with an average mapping quality score of approximately 60 (i.e., a mapping accuracy of 99.9999%). For the NPS samples, the average genomic coverage was 89.35% on the QNome nanopore platform compared with 90.39% for MGI. In addition, fewer consensus genomes from QNome were determined to be good by Nextclade compare with MGI (p < 0.05). A total of 9004 mutations were identified using QNome sequencing, with substitutions being the most prevalent, in contrast, 10 997 mutations were detected on MGI (p < 0.05). Furthermore, 23 large deletions (i.e., deletions≥ 10 bp) were identified by QNome while 19/23 were supported by evidence from short-read sequencing. Phylogenetic analysis revealed that the Pango lineage of consensus genomes for SARS-CoV-2 sequenced by QNome concorded 83.04% with MGI. QNome nanopore sequencing, though challenged by read quality and accuracy compared to MGI, is overcoming these issues through bioinformatics and computational advances. The advantage of structure variation (SV) detection capabilities and real-time data analysis renders it a promising alternative nanopore platform for the surveillance of the SARS-CoV-2.
Collapse
Affiliation(s)
- Mulatijiang Maimaiti
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| | - Lingjun Kong
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| | - Qi Yu
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| | - Ziyi Wang
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| | - Yiwei Liu
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| | - Chenglin Yang
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| | - Wenhu Guo
- R&D centerFuzhou Agenmic Biotechnology Co. Ltd.FuzhouChina
- School of Medical Technology and EngineeringFujian Medical UniversityFuzhouChina
| | - Lijun Jin
- Department of BioinformaticsFuzhou Ji'Ang Medical LaboratoryFuzhouChina
| | - Jie Yi
- Department of Clinical LaboratoryPeking Union Medical College HospitalBeijingChina
| |
Collapse
|
15
|
Belchikov N, Hsu J, Li XJ, Jarroux J, Hu W, Joglekar A, Tilgner HU. Understanding isoform expression by pairing long-read sequencing with single-cell and spatial transcriptomics. Genome Res 2024; 34:1735-1746. [PMID: 39567235 DOI: 10.1101/gr.279640.124] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2024]
Abstract
RNA isoform diversity, produced via alternative splicing, and alternative usage of transcription start and poly(A) sites, results in varied transcripts being derived from the same gene. Distinct isoforms can play important biological roles, including by changing the sequences or expression levels of protein products. The first single-cell approaches to RNA sequencing-and later, spatial approaches-which are now widely used for the identification of differentially expressed genes, rely on short reads and offer the ability to transcriptomically compare different cell types but are limited in their ability to measure differential isoform expression. More recently, long-read sequencing methods have been combined with single-cell and spatial technologies in order to characterize isoform expression. In this review, we provide an overview of the emergence of single-cell and spatial long-read sequencing and discuss the challenges associated with the implementation of these technologies and interpretation of these data. We discuss the opportunities they offer for understanding the relationships between the distinct variable elements of transcript molecules and highlight some of the ways in which they have been used to characterize isoforms' roles in development and pathology. Single-nucleus long-read sequencing, a special case of the single-cell approach, is also discussed. We attempt to cover both the limitations of these technologies and their significant potential for expanding our still-limited understanding of the biological roles of RNA isoforms.
Collapse
Affiliation(s)
- Natan Belchikov
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, New York 10021, USA
- Physiology, Biophysics, and Systems Biology Program, Weill Cornell Medicine, New York, New York 10065, USA
| | - Justine Hsu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, New York 10021, USA
| | - Xiang Jennie Li
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, New York 10021, USA
- Computational Biology Master's Program, Weill Cornell Medicine, New York, New York 10065, USA
| | - Julien Jarroux
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, New York 10021, USA
| | - Wen Hu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, New York 10021, USA
| | - Anoushka Joglekar
- New York Genome Center, New York, New York 10013, USA
- Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA
| | - Hagen U Tilgner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York 10065, USA;
- Center for Neurogenetics, Weill Cornell Medicine, New York, New York 10021, USA
| |
Collapse
|
16
|
Shoaran M, Sabaie H, Mostafavi M, Rezazadeh M. A comprehensive review of the applications of RNA sequencing in celiac disease research. Gene 2024; 927:148681. [PMID: 38871036 DOI: 10.1016/j.gene.2024.148681] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/06/2024] [Accepted: 06/10/2024] [Indexed: 06/15/2024]
Abstract
RNA sequencing (RNA-seq) has undergone substantial advancements in recent decades and has emerged as a vital technique for profiling the transcriptome. The transition from bulk sequencing to single-cell and spatial approaches has facilitated the achievement of higher precision at cell resolution. It provides valuable biological knowledge about individual immune cells and aids in the discovery of the molecular mechanisms that contribute to the development of autoimmune diseases. Celiac disease (CeD) is an autoimmune disorder characterized by a strong immune response to gluten consumption. RNA-seq has led to significantly advanced research in multiple fields, particularly in CeD research. It has been instrumental in studies involving comparative transcriptomics, nutritional genomics and wheat research, cancer research in the context of CeD, genetic and noncoding RNA-mediated epigenetic insights, disease monitoring and biomarker discovery, regulation of mitochondrial functions, therapeutic target identification and drug mechanism of action, dietary factors, immune cell profiling and the immune landscape. This review offers a comprehensive examination of recent RNA-seq technology research in the field of CeD, highlighting future challenges and opportunities for its application.
Collapse
Affiliation(s)
- Maryam Shoaran
- Pediatric Health Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Hani Sabaie
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam Rezazadeh
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
17
|
Smith GJ, van Alen TA, van Kessel MA, Lücker S. Simple, reference-independent assessment to empirically guide correction and polishing of hybrid microbial community metagenomic assembly. PeerJ 2024; 12:e18132. [PMID: 39529629 PMCID: PMC11552494 DOI: 10.7717/peerj.18132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 08/29/2024] [Indexed: 11/16/2024] Open
Abstract
Hybrid metagenomic assembly of microbial communities, leveraging both long- and short-read sequencing technologies, is becoming an increasingly accessible approach, yet its widespread application faces several challenges. High-quality references may not be available for assembly accuracy comparisons common for benchmarking, and certain aspects of hybrid assembly may benefit from dataset-dependent, empiric guidance rather than the application of a uniform approach. In this study, several simple, reference-free characteristics-particularly coding gene content and read recruitment profiles-were hypothesized to be reliable indicators of assembly quality improvement during iterative error-fixing processes. These characteristics were compared to reference-dependent genome- and gene-centric analyses common for microbial community metagenomic studies. Two laboratory-scale bioreactors were sequenced with short- and long-read platforms, and assembled with commonly used software packages. Following long read assembly, long read correction and short read polishing were iterated up to ten times to resolve errors. These iterative processes were shown to have a substantial effect on gene- and genome-centric community compositions. Simple, reference-free assembly characteristics, specifically changes in gene fragmentation and short read recruitment, were robustly correlated with advanced analyses common in published comparative studies, and therefore are suitable proxies for hybrid metagenome assembly quality to simplify the identification of the optimal number of correction and polishing iterations. As hybrid metagenomic sequencing approaches will likely remain relevant due to the low added cost of short-read sequencing for differential coverage binning or the ability to access lower abundance community members, it is imperative that users are equipped to estimate assembly quality prior to downstream analyses.
Collapse
Affiliation(s)
- Garrett J. Smith
- Department of Microbiology, The Ohio State University, Columbus, OH, United States of America
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| | - Theo A. van Alen
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| | - Maartje A.H.J. van Kessel
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| | - Sebastian Lücker
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
18
|
Li W, Huang Y, Yuan H, Han J, Li Z, Tong A, Li Y, Li H, Liu Y, Jia L, Wang X, Li J, Zhang B, Li L. Characterizing transcripts of HIV-1 different substrains using direct RNA sequencing. Heliyon 2024; 10:e39474. [PMID: 39512311 PMCID: PMC11541491 DOI: 10.1016/j.heliyon.2024.e39474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Revised: 10/07/2024] [Accepted: 10/15/2024] [Indexed: 11/15/2024] Open
Abstract
Post-transcriptional processing and modification of viral RNA, including alternative splicing, polyadenylation, and methylation, play crucial roles in regulating viral gene expression, enhancing genomic stability, and increasing replication efficiency. These processes have significant implications for viral biology and antiviral therapies. In this study, using Oxford Nanopore Technology (ONT) direct RNA sequencing (DRS), we provided a comprehensive analysis of the transcriptome and epitranscriptome features of the HIV-1 B (NL4-3) subtype strain and, for the first time, characterized these features in the CRF01_AE (GX2005002) subtype strain. We identified 11 novel splicing sites among the 61 RNA isoforms in NL4-3 and defined the splicing sites for GX2005002 based on its 63 RNA isoforms. Furthermore, we identified 74 and 79 chemically modified sites in the transcripts of NL4-3 and GX2005002, respectively. Although differences in poly(A) tail length were observed between the two HIV-1 strains, no specific correlation was detected between poly(A) tail length and the number of modification sites. Additionally, three distinct N6-methyladenosine (m6A) modification sites were identified in both NL4-3 and GX2005002 transcripts. This study provides a detailed analysis of post-transcriptional processing modifications in HIV-1 and suggests promising avenues for future research that could potentially be applied as new therapeutic targets in HIV treatment.
Collapse
Affiliation(s)
- Weizhen Li
- School of Public Health and Health Management, Gannan Medical University, Ganzhou, Jiangxi, 341000, China
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Yong Huang
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Haowen Yuan
- Department of Microbiological Laboratory Technology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, 250012, China
| | - Jingwan Han
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Zhengyang Li
- School of Public Health and Health Management, Gannan Medical University, Ganzhou, Jiangxi, 341000, China
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Aiping Tong
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Yating Li
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Hanping Li
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Yongjian Liu
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Lei Jia
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Xiaolin Wang
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Jingyun Li
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Bohan Zhang
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| | - Lin Li
- School of Public Health and Health Management, Gannan Medical University, Ganzhou, Jiangxi, 341000, China
- State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Sciences, Beijing, 100071, China
| |
Collapse
|
19
|
Woo S, Hossain MI, Jung S, Yeo D, Yoon D, Hwang S, Do HJ, Eyun SI, Choi C. Whole genome sequencing and genome characterization of Aichivirus isolated from Korean adults. J Med Virol 2024; 96:e29902. [PMID: 39228345 DOI: 10.1002/jmv.29902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 08/14/2024] [Accepted: 08/24/2024] [Indexed: 09/05/2024]
Abstract
The whole-genome sequence (WGS) analysis of Aichivirus (AiV) identified in Korea was performed in this study. Using Sanger and Nanopore sequencing, the 8228-nucleotide-long genomic sequence of AiV (OQ121963) was determined and confirmed to belong to genotype A. The full-length genome of OQ121963 consisted of a 7296 nt open reading frame (ORF) that encodes a single polyprotein, and 5' UTR (676 nt) and 3' UTR (256 nt) at 5' and 3' ends, respectively. The ORF consisted of leader protein (L), structural protein P1 (VP0, VP1, and VP3), and nonstructural protein P2 (2A, 2B, and 2C) and P3 (3A, 3B, 3C, and 3D). The secondary structure analysis of the 5' UTR identified only stem-loop C (SL-C) and not SL-A and SL-B. The variable region of the AiV genome was analyzed by MegAlign Pro and reconfirmed by SimPlot analysis using 16 AiV whole genomes known to date. Among the entire regions, structural protein region P1 showed the lowest amino acid identity (96.07%) with reference sequence AB040749 (originated in Japan; genotype A), while the highest amino acid identity (98.26%) was confirmed in the 3D region among nonstructural protein region P2 and P3. Moreover, phylogenetic analysis of the WGS of OQ121963 showed the highest homology (96.96%) with JX564249 (originated in Taiwan; genotype A) and lowest homology (90.14%) with DQ028632 (originated in Brazil; genotype B). Therefore, the complete genome characterization of OQ121963 and phylogenetic analysis of the AiV conducted in this study provide useful information allowing to improve diagnostic tools and epidemiological studies of AiVs.
Collapse
Affiliation(s)
- Seoyoung Woo
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| | - Md Iqbal Hossain
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| | - Soontag Jung
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| | - Daseul Yeo
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| | - Danbi Yoon
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| | - Seongwon Hwang
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| | - Hee-Jung Do
- Department of Life Science, College of Natural Sciences, Chung-Ang University, Seoul, Republic of Korea
| | - Seong-Il Eyun
- Department of Life Science, College of Natural Sciences, Chung-Ang University, Seoul, Republic of Korea
| | - Changsun Choi
- Department of Food and Nutrition, College of Biotechnology and Natural Resources, Chung-Ang University, Anseong, Republic of Korea
| |
Collapse
|
20
|
Kitsou K, Katzourakis A, Magiorkinis G. Limitations of current high-throughput sequencing technologies lead to biased expression estimates of endogenous retroviral elements. NAR Genom Bioinform 2024; 6:lqae081. [PMID: 38984066 PMCID: PMC11231582 DOI: 10.1093/nargab/lqae081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 04/09/2024] [Accepted: 06/27/2024] [Indexed: 07/11/2024] Open
Abstract
Human endogenous retroviruses (HERVs), the remnants of ancient germline retroviral integrations, comprise almost 8% of the human genome. The elucidation of their biological roles is hampered by our inability to link HERV mRNA and protein production with specific HERV loci. To solve the riddle of the integration-specific RNA expression of HERVs, several bioinformatics approaches have been proposed; however, no single process seems to yield optimal results due to the repetitiveness of HERV integrations. The performance of existing data-bioinformatics pipelines has been evaluated against real world datasets whose true expression profile is unknown, thus the accuracy of widely-used approaches remains unclear. Here, we simulated mRNA production from specific HERV integrations to evaluate second and third generation sequencing technologies along with widely used bioinformatic approaches to estimate the accuracy in describing integration-specific expression. We demonstrate that, while a HERV-family approach offers accurate results, per-integration analyses of HERV expression suffer from substantial expression bias, which is only partially mitigated by algorithms developed for calculating the per-integration HERV expression, and is more pronounced in recent integrations. Hence, this bias could erroneously result into biologically meaningful inferences. Finally, we demonstrate the merits of accurate long-read high-throughput sequencing technologies in the resolution of per-locus HERV expression.
Collapse
Affiliation(s)
- Konstantina Kitsou
- Department of Hygiene, Epidemiology and Medical Statistics, National and Kapodistrian University of Athens, Athens 11527, Greece
| | | | - Gkikas Magiorkinis
- Department of Hygiene, Epidemiology and Medical Statistics, National and Kapodistrian University of Athens, Athens 11527, Greece
| |
Collapse
|
21
|
Saville L, Wu L, Habtewold J, Cheng Y, Gollen B, Mitchell L, Stuart-Edwards M, Haight T, Mohajerani M, Zovoilis A. NERD-seq: a novel approach of Nanopore direct RNA sequencing that expands representation of non-coding RNAs. Genome Biol 2024; 25:233. [PMID: 39198865 PMCID: PMC11351768 DOI: 10.1186/s13059-024-03375-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 08/20/2024] [Indexed: 09/01/2024] Open
Abstract
Non-coding RNAs (ncRNAs) are frequently documented RNA modification substrates. Nanopore Technologies enables the direct sequencing of RNAs and the detection of modified nucleobases. Ordinarily, direct RNA sequencing uses polyadenylation selection, studying primarily mRNA gene expression. Here, we present NERD-seq, which enables detection of multiple non-coding RNAs, excluded by the standard approach, alongside natively polyadenylated transcripts. Using neural tissues as a proof of principle, we show that NERD-seq expands representation of frequently modified non-coding RNAs, such as snoRNAs, snRNAs, scRNAs, srpRNAs, tRNAs, and rRFs. NERD-seq represents an RNA-seq approach to simultaneously study mRNA and ncRNA epitranscriptomes in brain tissues and beyond.
Collapse
Affiliation(s)
- Luke Saville
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Li Wu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
| | - Jemaneh Habtewold
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
| | - Yubo Cheng
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Babita Gollen
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Liam Mitchell
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Matthew Stuart-Edwards
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Travis Haight
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Majid Mohajerani
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada
| | - Athanasios Zovoilis
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E3N4, Canada.
- Paul Albrechtsen Research Institute, CCMB, Winnipeg, MB, R3E3N4, Canada.
- Southern Alberta Genome Sciences Centre, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada.
- Canadian Centre for Behavioral Neuroscience, University of Lethbridge, Lethbridge, AB, T1K3M4, Canada.
| |
Collapse
|
22
|
Abebe JS, Alwie Y, Fuhrmann E, Leins J, Mai J, Verstraten R, Schreiner S, Wilson AC, Depledge DP. Nanopore guided annotation of transcriptome architectures. mSystems 2024; 9:e0050524. [PMID: 38953320 PMCID: PMC11265410 DOI: 10.1128/msystems.00505-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 06/11/2024] [Indexed: 07/04/2024] Open
Abstract
Nanopore direct RNA sequencing (DRS) enables the capture and full-length sequencing of native RNAs, without recoding or amplification bias. Resulting data sets may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails, on individual RNA molecules. The success of these analyses is highly dependent on the provision of high-resolution transcriptome annotations in combination with workflows that minimize misalignments and other analysis artifacts. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene-dense genomes of viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS data sets that inform resulting read alignments and developed the nanopore guided annotation of transcriptome architectures (NAGATA) software package (https://github.com/DepledgeLab/NAGATA). We demonstrate, using a combination of synthetic and original DRS data sets derived from adenoviruses, herpesviruses, coronaviruses, and human cells, that NAGATA outperforms existing transcriptome annotation software and yields a consistently high level of precision and recall when reconstructing both gene sparse and gene-dense transcriptomes. Finally, we apply NAGATA to generate the first high-resolution transcriptome annotation of the neglected pathogen human adenovirus type F41 (HAdV-41) for which we identify 77 distinct transcripts encoding at least 23 different proteins. IMPORTANCE The transcriptome of an organism denotes the full repertoire of encoded RNAs that may be expressed. This is critical to understanding the biology of an organism and for accurate transcriptomic and epitranscriptomic-based analyses. Annotating transcriptomes remains a complex task, particularly in small gene-dense organisms such as viruses which maximize their coding capacity through overlapping RNAs. To resolve this, we have developed a new software nanopore guided annotation of transcriptome architectures (NAGATA) which utilizes nanopore direct RNA sequencing (DRS) datasets to rapidly produce high-resolution transcriptome annotations for diverse viruses and other organisms.
Collapse
Affiliation(s)
- Jonathan S. Abebe
- Department of Microbiology, New York University School of Medicine, New York, New York, USA
| | - Yasmine Alwie
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Erik Fuhrmann
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Jonas Leins
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Julia Mai
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Ruth Verstraten
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
| | - Sabrina Schreiner
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | - Angus C. Wilson
- Department of Microbiology, New York University School of Medicine, New York, New York, USA
| | - Daniel P. Depledge
- Department of Microbiology, New York University School of Medicine, New York, New York, USA
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| |
Collapse
|
23
|
Huang A, Feng S, Ye Z, Zhang T, Chen S, Chen C, Chen S. Genome Assembly and Structural Variation Analysis of Luffa acutangula Provide Insights on Flowering Time and Ridge Development. PLANTS (BASEL, SWITZERLAND) 2024; 13:1828. [PMID: 38999668 PMCID: PMC11243878 DOI: 10.3390/plants13131828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/21/2024] [Accepted: 06/21/2024] [Indexed: 07/14/2024]
Abstract
Luffa spp. is an important worldwide cultivated vegetable and medicinal plant from the Cucurbitaceae family. In this study, we report a high-quality chromosome-level genome of the high-generation inbred line SG261 of Luffa acutangula. The genomic sequence was determined by PacBio long reads, Hi-C sequencing reads, and 10× Genomics sequencing, with an assembly size of 739.82 Mb, contig N50 of 18.38 Mb, and scaffold N50 of 56.08 Mb. The genome of L. acutangula SG261 was predicted to contain 27,312 protein-coding genes and 72.56% repetitive sequences, of which long terminal repeats (LTRs) were an important form of repetitive sequences, accounting for 67.84% of the genome. Phylogenetic analysis reveals that L. acutangula evolved later than Luffa cylindrica, and Luffa is closely related to Momodica charantia. Comparing the genome of L. acutangula SG261 and L. cylindrica with PacBio data, 67,128 high-quality structural variations (SVs) and 55,978 presence-absence variations (PAVs) were identified in SG261, resulting in 2424 and 1094 genes with variation in the CDS region, respectively, and there are 287 identical genes affected by two different structural variation analyses. In addition, we found that the transcription factor FY (FLOWERING LOCUS Y) families had a large expansion in L. acutangula SG261 (flowering in the morning) compared to L. cylindrica (flowering in the afternoon), which may result in the early flowering time in L. acutangula SG261. This study provides valuable reference for the breeding of and pan-genome research into Luffa species.
Collapse
Affiliation(s)
- Aizheng Huang
- Institute of Agricultural Science Research of Jiangmen, Jiangmen 529060, China;
| | - Shuo Feng
- College of Horticulture, South China Agricultural University, Guangzhou 510642, China; (S.F.)
| | - Zhuole Ye
- Dongguan Agricultural Scientific Research Center, Dongguan 523086, China
| | - Ting Zhang
- College of Horticulture, South China Agricultural University, Guangzhou 510642, China; (S.F.)
| | - Shenglong Chen
- Dongguan Agricultural Scientific Research Center, Dongguan 523086, China
| | - Changming Chen
- College of Horticulture, South China Agricultural University, Guangzhou 510642, China; (S.F.)
| | - Shijun Chen
- Institute of Agricultural Science Research of Jiangmen, Jiangmen 529060, China;
| |
Collapse
|
24
|
Jansz N, Faulkner GJ. Viral genome sequencing methods: benefits and pitfalls of current approaches. Biochem Soc Trans 2024; 52:1431-1447. [PMID: 38747720 PMCID: PMC11346438 DOI: 10.1042/bst20231322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 06/27/2024]
Abstract
Whole genome sequencing of viruses provides high-resolution molecular insights, enhancing our understanding of viral genome function and phylogeny. Beyond fundamental research, viral sequencing is increasingly vital for pathogen surveillance, epidemiology, and clinical applications. As sequencing methods rapidly evolve, the diversity of viral genomics applications and catalogued genomes continues to expand. Advances in long-read, single molecule, real-time sequencing methodologies present opportunities to sequence contiguous, haplotype resolved viral genomes in a range of research and applied settings. Here we present an overview of nucleic acid sequencing methods and their applications in studying viral genomes. We emphasise the advantages of different viral sequencing approaches, with a particular focus on the benefits of third-generation sequencing technologies in elucidating viral evolution, transmission networks, and pathogenesis.
Collapse
Affiliation(s)
- Natasha Jansz
- Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
| | - Geoffrey J. Faulkner
- Mater Research Institute - University of Queensland, TRI Building, Woolloongabba, QLD 4102, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
25
|
Goussarov G, Mysara M, Cleenwerck I, Claesen J, Leys N, Vandamme P, Van Houdt R. Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities. MICROBIOLOGY (READING, ENGLAND) 2024; 170:001469. [PMID: 38916949 PMCID: PMC11261854 DOI: 10.1099/mic.0.001469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/23/2024] [Indexed: 06/26/2024]
Abstract
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Bioinformatics group, Information Technology & Computer Science, Nile University, Giza, Egypt
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
26
|
Inamo J, Suzuki A, Ueda MT, Yamaguchi K, Nishida H, Suzuki K, Kaneko Y, Takeuchi T, Hatano H, Ishigaki K, Ishihama Y, Yamamoto K, Kochi Y. Long-read sequencing for 29 immune cell subsets reveals disease-linked isoforms. Nat Commun 2024; 15:4285. [PMID: 38806455 PMCID: PMC11133395 DOI: 10.1038/s41467-024-48615-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 05/02/2024] [Indexed: 05/30/2024] Open
Abstract
Alternative splicing events are a major causal mechanism for complex traits, but they have been understudied due to the limitation of short-read sequencing. Here, we generate a full-length isoform annotation of human immune cells from an individual by long-read sequencing for 29 cell subsets. This contains a number of unannotated transcripts and isoforms such as a read-through transcript of TOMM40-APOE in the Alzheimer's disease locus. We profile characteristics of isoforms and show that repetitive elements significantly explain the diversity of unannotated isoforms, providing insight into the human genome evolution. In addition, some of the isoforms are expressed in a cell-type specific manner, whose alternative 3'-UTRs usage contributes to their specificity. Further, we identify disease-associated isoforms by isoform switch analysis and by integration of several quantitative trait loci analyses with genome-wide association study data. Our findings will promote the elucidation of the mechanism of complex diseases via alternative splicing.
Collapse
Affiliation(s)
- Jun Inamo
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Mahoko Takahashi Ueda
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| | - Kensuke Yamaguchi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
- Biomedical Engineering Research Innovation Center, Institute of Biomaterials and Bioengineering, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| | - Hiroshi Nishida
- Department of Molecular Systems Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan
| | - Katsuya Suzuki
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Yuko Kaneko
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Tsutomu Takeuchi
- Division of Rheumatology, Department of Internal Medicine, Keio University School of Medicine, Tokyo, 160-8582, Japan
- Saitama Medical University, 38 Morohongo, Moroyama, Iruma, Saitama, 350-0495, Japan
| | - Hiroaki Hatano
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Kazuyoshi Ishigaki
- Laboratory for Human Immunogenetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Yasushi Ishihama
- Department of Molecular Systems Bioanalysis, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, 606-8501, Japan
- Laboratory of Proteomics for Drug Discovery, National Institute of Biomedical Innovation, Health and Nutrition, Ibaraki, Osaka, 567-0085, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan
| | - Yuta Kochi
- Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.
| |
Collapse
|
27
|
Tang T, Liu Y, Zheng B, Li R, Zhang X, Liu Y. Integration of hybrid and self-correction method improves the quality of long-read sequencing data. Brief Funct Genomics 2024; 23:249-255. [PMID: 37340778 DOI: 10.1093/bfgp/elad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/04/2023] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open
Abstract
Third-generation sequencing (TGS) technologies have revolutionized genome science in the past decade. However, the long-read data produced by TGS platforms suffer from a much higher error rate than that of the previous technologies, thus complicating the downstream analysis. Several error correction tools for long-read data have been developed; these tools can be categorized into hybrid and self-correction tools. So far, these two types of tools are separately investigated, and their interplay remains understudied. Here, we integrate hybrid and self-correction methods for high-quality error correction. Our procedure leverages the inter-similarity between long-read data and high-accuracy information from short reads. We compare the performance of our method and state-of-the-art error correction tools on Escherichia coli and Arabidopsis thaliana datasets. The result shows that the integration approach outperformed the existing error correction methods and holds promise for improving the quality of downstream analyses in genomic research.
Collapse
Affiliation(s)
- Tao Tang
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| | - Binshuang Zheng
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Rong Li
- School of Mordern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Qixia District, 210023, Jiangsu, China
| | - Xiaocai Zhang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), 138632, Singapore, Singapore
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Changsha, China
| |
Collapse
|
28
|
Rehman A, Tian C, He S, Li H, Lu S, Du X, Peng Z. Transcriptome dynamics of Gossypium purpurascens in response to abiotic stresses by Iso-seq and RNA-seq data. Sci Data 2024; 11:477. [PMID: 38724643 PMCID: PMC11081948 DOI: 10.1038/s41597-024-03334-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/30/2024] [Indexed: 05/12/2024] Open
Abstract
Gossypium purpurascens is a member of the Malvaceae family, holds immense economic significance as a fiber crop worldwide. Abiotic stresses harm cotton crops, reduce yields, and cause economic losses. Generating high-quality reference genomes and large-scale transcriptomic datasets across diverse conditions can offer valuable insights into identifying preferred agronomic traits for crop breeding. The present research used leaf tissues to conduct PacBio Iso-seq and RNA-seq analysis. We carried out an in-depth analysis of DEGs using both correlations with cluster analysis and principal component analysis. Additionally, the study also involved the identification of both lncRNAs and CDS. We have prepared RNA-seq libraries from 75 RNA samples to study the effects of drought, salinity, alkali, and saline-alkali stress, as well as control conditions. A total of 454.06 Gigabytes of transcriptome data were effectively validated through the identification of differentially expressed genes and KEGG and GO analysis. Overwhelmingly, gene expression profiles and full-length transcripts from cotton tissues will aid in understanding the genetic mechanism of abiotic stress tolerance in G. purpurascens.
Collapse
Affiliation(s)
- Abdul Rehman
- Zhengzhou Research Base, State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, China
| | - Chunyan Tian
- Zhengzhou Research Base, State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, China
| | - Shoupu He
- Zhengzhou Research Base, State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, China
- State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), Anyang, Henan, 455000, China
| | - Hongge Li
- Zhengzhou Research Base, State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, China
- State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), Anyang, Henan, 455000, China
| | - Shuai Lu
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou, 450001, China
| | - Xiongming Du
- Zhengzhou Research Base, State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, China.
- State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), Anyang, Henan, 455000, China.
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, 572024, China.
| | - Zhen Peng
- Zhengzhou Research Base, State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou, 450001, China.
- State Key Laboratory of Cotton Bio-breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (ICR, CAAS), Anyang, Henan, 455000, China.
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan, 572024, China.
| |
Collapse
|
29
|
Pardo-Palacios FJ, Arzalluz-Luque A, Kondratova L, Salguero P, Mestre-Tomás J, Amorín R, Estevan-Morió E, Liu T, Nanni A, McIntyre L, Tseng E, Conesa A. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat Methods 2024; 21:793-797. [PMID: 38509328 PMCID: PMC11093726 DOI: 10.1038/s41592-024-02229-2] [Citation(s) in RCA: 51] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 03/01/2024] [Indexed: 03/22/2024]
Abstract
SQANTI3 is a tool designed for the quality control, curation and annotation of long-read transcript models obtained with third-generation sequencing technologies. Leveraging its annotation framework, SQANTI3 calculates quality descriptors of transcript models, junctions and transcript ends. With this information, potential artifacts can be identified and replaced with reliable sequences. Furthermore, the integrated functional annotation feature enables subsequent functional iso-transcriptomics analyses.
Collapse
Affiliation(s)
- Francisco J Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Valencia, Valencia, Spain
| | - Angeles Arzalluz-Luque
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Valencia, Valencia, Spain
| | - Liudmyla Kondratova
- Horticultural Sciences Department, University of Florida, Gainesville, FL, USA
- Genetics Institute, University of Florida, Gainesville, FL, USA
| | - Pedro Salguero
- Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Valencia, Valencia, Spain
| | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Rocío Amorín
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Eva Estevan-Morió
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain
| | - Adalena Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | - Lauren McIntyre
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | | | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Paterna, Valencia, Spain.
| |
Collapse
|
30
|
Leblanc J, Boulle O, Roux E, Nicolas J, Lavenier D, Audic Y. Fully in vitro iterative construction of a 24 kb-long artificial DNA sequence to store digital information. Biotechniques 2024; 76:203-215. [PMID: 38573592 DOI: 10.2144/btn-2023-0109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2024] Open
Abstract
In the absence of a DNA template, the ab initio production of long double-stranded DNA molecules of predefined sequences is particularly challenging. The DNA synthesis step remains a bottleneck for many applications such as functional assessment of ancestral genes, analysis of alternative splicing or DNA-based data storage. In this report we propose a fully in vitro protocol to generate very long double-stranded DNA molecules starting from commercially available short DNA blocks in less than 3 days using Golden Gate assembly. This innovative application allowed us to streamline the process to produce a 24 kb-long DNA molecule storing part of the Declaration of the Rights of Man and of the Citizen of 1789 . The DNA molecule produced can be readily cloned into a suitable host/vector system for amplification and selection.
Collapse
Affiliation(s)
- Julien Leblanc
- University Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, Rennes, France
| | - Olivier Boulle
- University Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, Rennes, France
| | - Emeline Roux
- Institut NuMeCan, INRAE, INSERM, University Rennes, France
| | - Jacques Nicolas
- University Rennes, Inria, CNRS, IRISA, Campus de Beaulieu, Rennes, France
| | | | - Yann Audic
- CNRS, University Rennes, Institut de Génétique et Développement de Rennes (IGDR) UMR 6290, Rennes, France
| |
Collapse
|
31
|
Espinosa E, Bautista R, Larrosa R, Plata O. Advancements in long-read genome sequencing technologies and algorithms. Genomics 2024; 116:110842. [PMID: 38608738 DOI: 10.1016/j.ygeno.2024.110842] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/01/2024] [Accepted: 04/06/2024] [Indexed: 04/14/2024]
Abstract
The recent advent of long read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), have led to substantial improvements in accuracy and computational cost in sequencing genomes. However, de novo whole-genome assembly still presents significant challenges related to the quality of the results. Pursuing de novo whole-genome assembly remains a formidable challenge, underscored by intricate considerations surrounding computational demands and result quality. As sequencing accuracy and throughput steadily advance, a continuous stream of innovative assembly tools floods the field. Navigating this dynamic landscape necessitates a reasonable choice of sequencing platform, depth, and assembly tools to orchestrate high-quality genome reconstructions. This comprehensive review delves into the intricate interplay between cutting-edge long read sequencing technologies, assembly methodologies, and the ever-evolving field of genomics. With a focus on addressing the pivotal challenges and harnessing the opportunities presented by these advancements, we provide an in-depth exploration of the crucial factors influencing the selection of optimal strategies for achieving robust and insightful genome assemblies.
Collapse
Affiliation(s)
- Elena Espinosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| | - Rocio Bautista
- Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Rafael Larrosa
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain; Supercomputing and Bioinnovation Center, University of Malaga, C. Severo Ochoa, 34, Malaga 29590, Spain.
| | - Oscar Plata
- Department of Computer Architecture, University of Malaga, Louis Pasteur, 35, Campus de Teatinos, Malaga 29071, Spain.
| |
Collapse
|
32
|
Nie F, Ni P, Huang N, Zhang J, Wang Z, Xiao C, Luo F, Wang J. De novo diploid genome assembly using long noisy reads. Nat Commun 2024; 15:2964. [PMID: 38580638 PMCID: PMC10997618 DOI: 10.1038/s41467-024-47349-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/25/2024] [Indexed: 04/07/2024] Open
Abstract
The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly on B. taurus (Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.
Collapse
Affiliation(s)
- Fan Nie
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- National Center for Applied Mathematics in Hunan and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, 411105, China
| | - Peng Ni
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Neng Huang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Jun Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- Xiangjiang Laboratory, Changsha, 410205, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China
| | - Zhenyu Wang
- Institute of Nanfan & Seed Industry, Guangdong Academy of Sciences, Guangdong, 510316, China
| | - Chuanle Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University #7 Jinsui Road, Tianhe District, Guangzhou, China.
| | - Feng Luo
- School of Computing, Clemson University, Clemson, SC, 29634-0974, USA.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
- Xiangjiang Laboratory, Changsha, 410205, China.
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, 410083, China.
| |
Collapse
|
33
|
Abebe JS, Alwie Y, Fuhrmann E, Leins J, Mai J, Verstraten R, Schreiner S, Wilson AC, Depledge DP. Nanopore Guided Annotation of Transcriptome Architectures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587744. [PMID: 38617228 PMCID: PMC11014626 DOI: 10.1101/2024.04.02.587744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
High-resolution annotations of transcriptomes from all domains of life are essential for many sequencing-based RNA analyses, including Nanopore direct RNA sequencing (DRS), which would otherwise be hindered by misalignments and other analysis artefacts. DRS allows the capture and full-length sequencing of native RNAs, without recoding or amplification bias, and resulting data may be interrogated to define the identity and location of chemically modified ribonucleotides, as well as the length of poly(A) tails on individual RNA molecules. Existing software solutions for generating high-resolution transcriptome annotations are poorly suited to small gene dense organisms such as viruses due to the challenge of identifying distinct transcript isoforms where alternative splicing and overlapping RNAs are prevalent. To resolve this, we identified key characteristics of DRS datasets and developed a novel approach to transcriptome. We demonstrate, using a combination of synthetic and original datasets, that our novel approach yields a high level of precision and recall when reconstructing both gene sparse and gene dense transcriptomes from DRS datasets. We further apply this approach to generate a new high resolution transcriptome annotation of the neglected pathogen human adenovirus type F 41 for which we identify 77 distinct transcripts encoding at least 23 different proteins.
Collapse
Affiliation(s)
- Jonathan S. Abebe
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - Yasmine Alwie
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Erik Fuhrmann
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Jonas Leins
- Institute of Virology, Hannover Medical School, Hannover, Germany
| | - Julia Mai
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Ruth Verstraten
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
| | - Sabrina Schreiner
- Institute of Virology, Hannover Medical School, Hannover, Germany
- Institute of Virology, University Medical Center, Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | - Angus C. Wilson
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
| | - Daniel P. Depledge
- Department of Microbiology, New York University School of Medicine, New York, NY, USA
- Institute of Virology, Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| |
Collapse
|
34
|
Cook R, Telatin A, Hsieh SY, Newberry F, Tariq MA, Baker DJ, Carding SR, Adriaenssens EM. Nanopore and Illumina sequencing reveal different viral populations from human gut samples. Microb Genom 2024; 10:001236. [PMID: 38683195 PMCID: PMC11092197 DOI: 10.1099/mgen.0.001236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 03/18/2024] [Indexed: 05/01/2024] Open
Abstract
The advent of viral metagenomics, or viromics, has improved our knowledge and understanding of global viral diversity. High-throughput sequencing technologies enable explorations of the ecological roles, contributions to host metabolism, and the influence of viruses in various environments, including the human intestinal microbiome. However, bacterial metagenomic studies frequently have the advantage. The adoption of advanced technologies like long-read sequencing has the potential to be transformative in refining viromics and metagenomics. Here, we examined the effectiveness of long-read and hybrid sequencing by comparing Illumina short-read and Oxford Nanopore Technology (ONT) long-read sequencing technologies and different assembly strategies on recovering viral genomes from human faecal samples. Our findings showed that if a single sequencing technology is to be chosen for virome analysis, Illumina is preferable due to its superior ability to recover fully resolved viral genomes and minimise erroneous genomes. While ONT assemblies were effective in recovering viral diversity, the challenges related to input requirements and the necessity for amplification made it less ideal as a standalone solution. However, using a combined, hybrid approach enabled a more authentic representation of viral diversity to be obtained within samples.
Collapse
Affiliation(s)
- Ryan Cook
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | | | | | - Fiona Newberry
- Department of Biosciences, Nottingham Trent University, Nottingham, NG11 8NS, UK
| | - Mohammad A. Tariq
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | | | - Simon R. Carding
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK
| | | |
Collapse
|
35
|
Xie L, Gong X, Yang K, Huang Y, Zhang S, Shen L, Sun Y, Wu D, Ye C, Zhu QH, Fan L. Technology-enabled great leap in deciphering plant genomes. NATURE PLANTS 2024; 10:551-566. [PMID: 38509222 DOI: 10.1038/s41477-024-01655-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 02/20/2024] [Indexed: 03/22/2024]
Abstract
Plant genomes provide essential and vital basic resources for studying many aspects of plant biology and applications (for example, breeding). From 2000 to 2020, 1,144 genomes of 782 plant species were sequenced. In the past three years (2021-2023), 2,373 genomes of 1,031 plant species, including 793 newly sequenced species, have been assembled, representing a great leap. The 2,373 newly assembled genomes, of which 63 are telomere-to-telomere assemblies and 921 have been generated in pan-genome projects, cover the major phylogenetic clades. Substantial advances in read length, throughput, accuracy and cost-effectiveness have notably simplified the achievement of high-quality assemblies. Moreover, the development of multiple software tools using different algorithms offers the opportunity to generate more complete and complex assemblies. A database named N3: plants, genomes, technologies has been developed to accommodate the metadata associated with the 3,517 genomes that have been sequenced from 1,575 plant species since 2000. We also provide an outlook for emerging opportunities in plant genome sequencing.
Collapse
Affiliation(s)
- Lingjuan Xie
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China
| | - Xiaojiao Gong
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Kun Yang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Yujie Huang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Shiyu Zhang
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Leti Shen
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China
| | - Yanqing Sun
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Dongya Wu
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Chuyu Ye
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China
| | - Qian-Hao Zhu
- CSIRO Agriculture and Food, Black Mountain Laboratories, Canberra, Australia
| | - Longjiang Fan
- Institute of Crop Sciences & Institute of Bioinformatics, Zhejiang University, Hangzhou, China.
- Hainan Institute of Zhejiang University, Yazhou Bay, Shanya, China.
| |
Collapse
|
36
|
Geo JA, Ameen R, Al Shemmari S, Thomas J. Advancements in HLA Typing Techniques and Their Impact on Transplantation Medicine. Med Princ Pract 2024; 33:215-231. [PMID: 38442703 PMCID: PMC11175610 DOI: 10.1159/000538176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/28/2024] [Indexed: 03/07/2024] Open
Abstract
HLA typing serves as a standard practice in hematopoietic stem cell transplantation to ensure compatibility between donors and recipients, preventing the occurrence of allograft rejection and graft-versus-host disease. Conventional laboratory methods that have been widely employed in the past few years, including sequence-specific primer PCR and sequencing-based typing (SBT), currently face the risk of becoming obsolete. This risk stems not only from the extensive diversity within HLA genes but also from the rapid advancement of next-generation sequencing and third-generation sequencing technologies. Third-generation sequencing systems like single-molecule real-time (SMRT) sequencing and Oxford Nanopore (ONT) sequencing have the capability to analyze long-read sequences that span entire intronic-exonic regions of HLA genes, effectively addressing challenges related to HLA ambiguity and the phasing of multiple short-read fragments. The growing dominance of these advanced sequencers in HLA typing is expected to solidify further through ongoing refinements, cost reduction, and error rate minimization. This review focuses on hematopoietic stem cell transplantation (HSCT) and explores prospective advancements and application of HLA DNA typing techniques. It explores how the adoption of third-generation sequencing technologies can revolutionize the field by offering improved accuracy, reduced ambiguity, and enhanced assessment of compatibility in HSCT. Embracing these cutting-edge technologies is essential to advancing the success rates and outcomes of hematopoietic stem cell transplantation. This review underscores the importance of staying at the forefront of HLA typing techniques to ensure the best possible outcomes for patients undergoing HSCT.
Collapse
Affiliation(s)
- Jeethu Anu Geo
- Medical Laboratory Sciences Department, Health Sciences Center, Kuwait University, Kuwait City, Kuwait
- Department of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore, India
| | - Reem Ameen
- Medical Laboratory Sciences Department, Health Sciences Center, Kuwait University, Kuwait City, Kuwait
| | - Salem Al Shemmari
- Department of Medicine, Health Sciences Center, Kuwait University, Kuwait City, Kuwait
| | - Jibu Thomas
- Department of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore, India
| |
Collapse
|
37
|
Westfall DH, Deng W, Pankow A, Murrell H, Chen L, Zhao H, Williamson C, Rolland M, Murrell B, Mullins JI. Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations-Application to HIV-1 quasispecies. Virus Evol 2024; 10:veae019. [PMID: 38765465 PMCID: PMC11099545 DOI: 10.1093/ve/veae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 12/19/2023] [Accepted: 02/20/2024] [Indexed: 05/22/2024] Open
Abstract
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing, which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence polymerase-chain reaction (PCR) amplicons derived from cDNA templates tagged with unique molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR. The use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Production of highly accurate sequences from the large datasets produced from SMRT-UMI sequencing is facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline). PORPIDpipeline automatically filters and parses circular consensus reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination, heteroduplex formation, or early cycle PCR errors. The optimized SMRT-UMI sequencing and PORPIDpipeline methods presented here represent a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus quasispecies in a virus transmitter-recipient pair of individuals.
Collapse
Affiliation(s)
- Dylan H Westfall
- Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
| | - Wenjie Deng
- Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
| | - Alec Pankow
- Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
| | - Hugh Murrell
- Department of Pathology, Division of Medical Virology, University of Cape Town and National Health Laboratory Services, Observatory, Cape Town 7925, South Africa
| | - Lennie Chen
- Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
| | - Hong Zhao
- Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
| | - Carolyn Williamson
- Department of Pathology, Division of Medical Virology, University of Cape Town and National Health Laboratory Services, Observatory, Cape Town 7925, South Africa
| | - Morgane Rolland
- US Military HIV Research Program, Walter Reed Army Institute of Research, 503 Robert Grant Avenue, Silver Spring, MD 20910, USA
- The Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., 6720A Rockledge Drive, Bethesda, MD 20817, USA
| | - Ben Murrell
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Solnavägen 9, Stockholm 171 65, Sweden
| | - James I Mullins
- Department of Microbiology, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
- Department of Medicine, University of Washington School of Medicine, 960 Republican Street, Seattle, WA 98195-8070, USA
- Department of Global Health, University of Washington Schools of Medicine and Public Health, 960 Republican Street, Seattle, WA 98195-8070, USA
| |
Collapse
|
38
|
Wang Z, Liu C, Liu W, Lv X, Hu T, Yang F, Yang W, He L, Huang X. Long-read sequencing reveals the structural complexity of genomic integration of HPV DNA in cervical cancer cell lines. BMC Genomics 2024; 25:198. [PMID: 38378450 PMCID: PMC10877919 DOI: 10.1186/s12864-024-10101-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 02/08/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Cervical cancer (CC) causes more than 311,000 deaths annually worldwide. The integration of human papillomavirus (HPV) is a crucial genetic event that contributes to cervical carcinogenesis. Despite HPV DNA integration is known to disrupt the genomic architecture of both the host and viral genomes in CC, the complexity of this process remains largely unexplored. RESULTS In this study, we conducted whole-genome sequencing (WGS) at 55-65X coverage utilizing the PacBio long-read sequencing platform in SiHa and HeLa cells, followed by comprehensive analyses of the sequence data to elucidate the complexity of HPV integration. Firstly, our results demonstrated that PacBio long-read sequencing effectively identifies HPV integration breakpoints with comparable accuracy to targeted-capture Next-generation sequencing (NGS) methods. Secondly, we constructed detailed models of complex integrated genome structures that included both the HPV genome and nearby regions of the human genome by utilizing PacBio long-read WGS. Thirdly, our sequencing results revealed the occurrence of a wide variety of genome-wide structural variations (SVs) in SiHa and HeLa cells. Additionally, our analysis further revealed a potential correlation between changes in gene expression levels and SVs on chromosome 13 in the genome of SiHa cells. CONCLUSIONS Using PacBio long-read sequencing, we have successfully constructed complex models illustrating HPV integrated genome structures in SiHa and HeLa cells. This accomplishment serves as a compelling demonstration of the valuable capabilities of long-read sequencing in detecting and characterizing HPV genomic integration structures within human cells. Furthermore, these findings offer critical insights into the complex process of HPV16 and HPV18 integration and their potential contribution to the development of cervical cancer.
Collapse
Affiliation(s)
- Zhijie Wang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Chen Liu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Wanxin Liu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Xinyi Lv
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Ting Hu
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Fan Yang
- Wuhan Kandwise Biotechnology, Inc. Wuhan, Hubei, China
| | - Wenhui Yang
- Wuhan Kandwise Biotechnology, Inc. Wuhan, Hubei, China
| | - Liang He
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| | - Xiaoyuan Huang
- Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
- Cancer Biology Research Center (Key Laboratory of the Ministry of Education), Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China.
| |
Collapse
|
39
|
Kang JN, Hur M, Kim CK, Yang SH, Lee SM. Enhancing transcriptome analysis in medicinal plants: multiple unigene sets in Astragalus membranaceus. FRONTIERS IN PLANT SCIENCE 2024; 15:1301526. [PMID: 38384760 PMCID: PMC10879423 DOI: 10.3389/fpls.2024.1301526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024]
Abstract
Astragalus membranaceus is a medicinal plant mainly used in East Asia and contains abundant secondary metabolites. Despite the importance of this plant, the available genomic and genetic information is still limited. De novo transcriptome construction is recognized as an essential method for transcriptome research when reference genome information is incomplete. In this study, we constructed three individual transcriptome sets (unigene sets) for detailed analysis of the phenylpropanoid biosynthesis pathway, a major metabolite of A. membranaceus. Set-1 was a circular consensus sequence (CCS) generated using PacBio sequencing (PacBio-seq). Set-2 consisted of hybridized assembled unigenes with Illumina sequencing (Illumina-seq) reads and PacBio CCS using rnaSPAdes. Set-3 unigenes were assembled from Illumina-seq reads using the Trinity software. Construction of multiple unigene sets provides several advantages for transcriptome analysis. First, it provides an appropriate expression filtering threshold for assembly-based unigenes: a threshold transcripts per million (TPM) ≥ 5 removed more than 88% of assembly-based unigenes, which were mostly short and low-expressing unigenes. Second, assembly-based unigenes compensated for the incomplete length of PacBio CCSs: the ends of the 5`/3` untranslated regions of phenylpropanoid-related unigenes derived from set-1 were incomplete, which suggests that PacBio CCSs are unlikely to be full-length transcripts. Third, more isoform unigenes could be obtained from multiple unigene sets; isoform unigenes missing in Set-1 were detected in set-2 and set-3. Finally, gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses showed that phenylpropanoid biosynthesis and carbohydrate metabolism were highly activated in A. membranaceus roots. Various sequencing technologies and assemblers have been developed for de novo transcriptome analysis. However, no technique is perfect for de novo transcriptome analysis, suggesting the need to construct multiple unigene sets. This method enables efficient transcript filtering and detection of longer and more diverse transcripts.
Collapse
Affiliation(s)
- Ji-Nam Kang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Mok Hur
- Department of Herbal Crop Resources, National Institute of Horticultural & Herbal Science, Eumseong-gun, Chungcheongbuk-do, Republic of Korea
| | - Chang-Kug Kim
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - So-Hee Yang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Si-Myung Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| |
Collapse
|
40
|
Ma J, Zhao X, Qi E, Han R, Yu T, Li G. Highly efficient clustering of long-read transcriptomic data with GeLuster. Bioinformatics 2024; 40:btae059. [PMID: 38310330 PMCID: PMC10881092 DOI: 10.1093/bioinformatics/btae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/08/2024] [Accepted: 01/30/2024] [Indexed: 02/05/2024] Open
Abstract
MOTIVATION The advancement of long-read RNA sequencing technologies leads to a bright future for transcriptome analysis, in which clustering long reads according to their gene family of origin is of great importance. However, existing de novo clustering algorithms require plenty of computing resources. RESULTS We developed a new algorithm GeLuster for clustering long RNA-seq reads. Based on our tests on one simulated dataset and nine real datasets, GeLuster exhibited superior performance. On the tested Nanopore datasets it ran 2.9-17.5 times as fast as the second-fastest method with less than one-seventh of memory consumption, while achieving higher clustering accuracy. And on the PacBio data, GeLuster also had a similar performance. It sets the stage for large-scale transcriptome study in future. AVAILABILITY AND IMPLEMENTATION GeLuster is freely available at https://github.com/yutingsdu/GeLuster.
Collapse
Affiliation(s)
- Junchi Ma
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Xiaoyu Zhao
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Enfeng Qi
- School of Mathematics and Statistics, Guangxi Normal University, Guilin 541000, China
| | - Renmin Han
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| | - Ting Yu
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences (Frontiers Science Center for Nonlinear Expectations), Shandong University, Qingdao 266237, China
| |
Collapse
|
41
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
42
|
Khan AM, Singh H, Ranganathan S, Gojobori T, Gao X. Editorial: 21st International Conference on Bioinformatics (InCoB 2022)-accelerating innovation to meet biological challenges: the role of bioinformatics. Front Genet 2024; 15:1365223. [PMID: 38333618 PMCID: PMC10850372 DOI: 10.3389/fgene.2024.1365223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 01/10/2024] [Indexed: 02/10/2024] Open
Affiliation(s)
- Asif M. Khan
- APBioNET.org, Singapore, Singapore
- School of Data Sciences, Perdana University, Kuala Lumpur, Malaysia
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Istanbul, Türkiye
- College of Computing and Information Technology, University of Doha for Science and Technology, Doha, Qatar
| | - Harpreet Singh
- APBioNET.org, Singapore, Singapore
- Hans Raj Mahila Maha Vidyalaya, Jalandhar, India
- Bioclues.org, Hyderabad, India
| | - Shoba Ranganathan
- APBioNET.org, Singapore, Singapore
- Applied BioSciences, Macquarie University, Sydney, NSW, Australia
- National Supercomputing Centre, Singapore, Singapore
| | - Takashi Gojobori
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Xin Gao
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
43
|
Jia X, Kang Z, Wang G, Zhang K, Fu X, Li C, Lai S, Chen SY. Long-read sequencing-based transcriptomic landscape in longissimus dorsi and transcriptome-wide association studies for growth traits of meat rabbits. Front Vet Sci 2024; 11:1320484. [PMID: 38318148 PMCID: PMC10839001 DOI: 10.3389/fvets.2024.1320484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 01/08/2024] [Indexed: 02/07/2024] Open
Abstract
Rabbits are an attractive meat livestock species that can efficiently convert human-indigestible plant biomass, and have been commonly used in biological and medical researches. Yet, transcriptomic landscape in muscle tissue and association between gene expression level and growth traits have not been specially studied in meat rabbits. In this study Oxford Nanopore Technologies (ONT) long-read sequencing technology was used for comprehensively exploring transcriptomic landscape in Longissimus dorsi for 115 rabbits at 84 days of age, and transcriptome-wide association studies (TWAS) were performed for growth traits, including body weight at 84 days of age and average daily gain during three growth periods. The statistical analysis of TWAS was performed using a mixed linear model, in which polygenic effect was fitted as a random effect according to gene expression level-based relationships. A total of 18,842 genes and 42,010 transcripts were detected, among which 35% of genes and 47% of transcripts were novel in comparison with the reference genome annotation. Furthermore, 45% of genes were widely expressed among more than 90% of individuals. The proportions (±SE) of phenotype variance explained by genome-wide gene expression level ranged from 0.501 ± 0.216 to 0.956 ± 0.209, and the similar results were obtained when explained by transcript expression level. In contrast, neither gene nor transcript was detected by TWAS to be statistically significantly associated with these growth traits. In conclusion, these novel genes and transcripts that have been extensively profiled in a single muscle tissue using long-read sequencing technology will greatly improve our understanding on transcriptional diversity in rabbits. Our results with a relatively small sample size further revealed the important contribution of global gene expression to phenotypic variation on growth performance, but it seemed that no single gene has an outstanding effect; this knowledge is helpful to include intermediate omics data for implementing genetic evaluation of growth traits in meat rabbits.
Collapse
Affiliation(s)
- Xianbo Jia
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, China
| | - Zhe Kang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, China
| | - Guozhi Wang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, China
| | - Kai Zhang
- Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Xiangchao Fu
- Sichuan Academy of Grassland Sciences, Chengdu, China
| | - Congyan Li
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu, China
| | - Songjia Lai
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, China
| | - Shi-Yi Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu, China
| |
Collapse
|
44
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. Genome Biol 2023; 24:286. [PMID: 38082294 PMCID: PMC10712166 DOI: 10.1186/s13059-023-03127-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
Long-read RNA sequencing has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile tool that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Camino de Vera, Valencia, 46022, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedrátic Agustín Escardino Benlloch, Paterna, 46980, Spain.
| |
Collapse
|
45
|
Zhang M, Huang X, Wu H. Application of Biological Nanopore Sequencing Technology in the Detection of Microorganisms †. CHINESE J CHEM 2023; 41:3473-3483. [DOI: 10.1002/cjoc.202300255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 07/14/2023] [Indexed: 01/05/2025]
Abstract
Comprehensive SummaryEnvironmental pollution and the spread of pathogenic microorganisms pose a significant threat to the health of humans and the planet. Thus, understanding and detecting microorganisms is crucial for maintaining a healthy living environment. Nanopore sequencing is a single‐molecule detection method developed in the 1990s that has revolutionized various research fields. It offers several advantages over traditional sequencing methods, including low cost, label‐free, time‐saving detection speed, long sequencing reading, real‐time monitoring, convenient carrying, and other significant advantages. In this review, we summarize the technical principles and characteristics of nanopore sequencing and discuss its applications in amplicon sequencing, metagenome sequencing, and whole‐genome sequencing of environmental microorganisms, as well as its in situ application under some special circumstances. We also analyze the advantages and challenges of nanopore sequencing in microbiology research. Overall, nanopore sequencing has the potential to greatly enhance the detection and understanding of microorganisms in environmental research, but further developments are needed to overcome the current challenges.
Collapse
Affiliation(s)
- Ming‐Qian Zhang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Analytical Chemistry for Living Biosystems, Institute of Chemistry Chinese Academy of Sciences Beijing 100190 China
- University of Chinese Academy of Sciences Beijing 100049 China
| | - Xiao‐Bin Huang
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Analytical Chemistry for Living Biosystems, Institute of Chemistry Chinese Academy of Sciences Beijing 100190 China
- University of Chinese Academy of Sciences Beijing 100049 China
| | - Hai‐Chen Wu
- Beijing National Laboratory for Molecular Sciences, Key Laboratory of Analytical Chemistry for Living Biosystems, Institute of Chemistry Chinese Academy of Sciences Beijing 100190 China
- University of Chinese Academy of Sciences Beijing 100049 China
| |
Collapse
|
46
|
Liehrmann A, Delannoy E, Launay-Avon A, Gilbault E, Loudet O, Castandet B, Rigaill G. DiffSegR: an RNA-seq data driven method for differential expression analysis using changepoint detection. NAR Genom Bioinform 2023; 5:lqad098. [PMID: 37954572 PMCID: PMC10632193 DOI: 10.1093/nargab/lqad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/27/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
To fully understand gene regulation, it is necessary to have a thorough understanding of both the transcriptome and the enzymatic and RNA-binding activities that shape it. While many RNA-Seq-based tools have been developed to analyze the transcriptome, most only consider the abundance of sequencing reads along annotated patterns (such as genes). These annotations are typically incomplete, leading to errors in the differential expression analysis. To address this issue, we present DiffSegR - an R package that enables the discovery of transcriptome-wide expression differences between two biological conditions using RNA-Seq data. DiffSegR does not require prior annotation and uses a multiple changepoints detection algorithm to identify the boundaries of differentially expressed regions in the per-base log2 fold change. In a few minutes of computation, DiffSegR could rightfully predict the role of chloroplast ribonuclease Mini-III in rRNA maturation and chloroplast ribonuclease PNPase in (3'/5')-degradation of rRNA, mRNA and tRNA precursors as well as intron accumulation. We believe DiffSegR will benefit biologists working on transcriptomics as it allows access to information from a layer of the transcriptome overlooked by the classical differential expression analysis pipelines widely used today. DiffSegR is available at https://aliehrmann.github.io/DiffSegR/index.html.
Collapse
Affiliation(s)
- Arnaud Liehrmann
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
- Laboratoire de Mathématiques et de Modélisation d’Evry (LaMME), Université d’Evry-Val-d’Essonne, UMR CNRS 8071, ENSIIE, USC INRAE, Evry,91037, France
| | - Etienne Delannoy
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Alexandra Launay-Avon
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Elodie Gilbault
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000, Versailles, France
| | - Olivier Loudet
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), 78000, Versailles, France
| | - Benoît Castandet
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
| | - Guillem Rigaill
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris-Saclay, CNRS, INRAE, Université Evry, Gif sur Yvette, 91190, France
- Institute of Plant Sciences Paris-Saclay (IPS2), Université Paris Cité, CNRS, INRAE, Gif sur Yvette, 91190, France
- Laboratoire de Mathématiques et de Modélisation d’Evry (LaMME), Université d’Evry-Val-d’Essonne, UMR CNRS 8071, ENSIIE, USC INRAE, Evry,91037, France
| |
Collapse
|
47
|
Lin D, Zou Y, Li X, Wang J, Xiao Q, Gao X, Lin F, Zhang N, Jiao M, Guo Y, Teng Z, Li S, Wei Y, Zhou F, Yin R, Zhang S, Xing L, Xu W, Wu X, Yang B, Xiao K, Wu C, Tao Y, Yang X, Zhang J, Hu S, Dong S, Li X, Ye S, Hong Z, Pan Y, Yang Y, Sun H, Cao G. MGA-seq: robust identification of extrachromosomal DNA and genetic variants using multiple genetic abnormality sequencing. Genome Biol 2023; 24:247. [PMID: 37904244 PMCID: PMC10614391 DOI: 10.1186/s13059-023-03081-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 10/04/2023] [Indexed: 11/01/2023] Open
Abstract
Genomic abnormalities are strongly associated with cancer and infertility. In this study, we develop a simple and efficient method - multiple genetic abnormality sequencing (MGA-Seq) - to simultaneously detect structural variation, copy number variation, single-nucleotide polymorphism, homogeneously staining regions, and extrachromosomal DNA (ecDNA) from a single tube. MGA-Seq directly sequences proximity-ligated genomic fragments, yielding a dataset with concurrent genome three-dimensional and whole-genome sequencing information, enabling approximate localization of genomic structural variations and facilitating breakpoint identification. Additionally, by utilizing MGA-Seq, we map focal amplification and oncogene coamplification, thus facilitating the exploration of ecDNA's transcriptional regulatory function.
Collapse
Affiliation(s)
- Da Lin
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Yanyan Zou
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Xinyu Li
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jinyue Wang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Qin Xiao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xiaochen Gao
- Precision Research Center for Refractory Diseases, Institute for Clinical Research, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fei Lin
- Reproductive Medical Center, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Ningyuan Zhang
- Reproductive Medical Center, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Ming Jiao
- Department of Laboratory Animal Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yu Guo
- Department of Laboratory Animal Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhaowei Teng
- The First People's Hospital of Yunnan Province, Affiliated Hospital of Kunming University of Science and Technology, Kunming, China
| | - Shiyi Li
- Baylor College of Medicine, Houston, TX, USA
- Department of Radiation & Medical Oncology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yongchang Wei
- Department of Radiation & Medical Oncology, Zhongnan Hospital of Wuhan University, Wuhan, China
- Hubei Key Laboratory of Tumor Biological Behaviors, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Fuling Zhou
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Rong Yin
- Department of Hematology, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Siheng Zhang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Lingyu Xing
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Weize Xu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xiaofeng Wu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Bing Yang
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Ke Xiao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Chengchao Wu
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Yingfeng Tao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Xiaoqing Yang
- Hospital of Huazhong Agricultural University, Wuhan, China
| | - Jing Zhang
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Sheng Hu
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shuang Dong
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xiaoyu Li
- Department of Medical Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shengwei Ye
- Department of Gastrointestinal Surgery, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhidan Hong
- Dapartment of Reproductive Medicine Center, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yihang Pan
- Precision Medicine Center, Scientific Research Center, School of Medicine, The Seventh Affiliated Hospital, Sun Yat-Sen University, Shenzhen, China
| | - Yuqin Yang
- Department of Laboratory Animal Center, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Haixiang Sun
- Reproductive Medical Center, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China.
| | - Gang Cao
- State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China.
- College of Bio-Medicine and Health, Huazhong Agricultural University, Wuhan, China.
- College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China.
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| |
Collapse
|
48
|
Akaçin İ, Ersoy Ş, Doluca O, Güngörmüşler M. Using custom-built primers and nanopore sequencing to evaluate CO-utilizer bacterial and archaeal populations linked to bioH 2 production. Sci Rep 2023; 13:17025. [PMID: 37813931 PMCID: PMC10562470 DOI: 10.1038/s41598-023-44357-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/06/2023] [Indexed: 10/11/2023] Open
Abstract
The microbial community composition of five distinct thermophilic hot springs was effectively described in this work, using broad-coverage nanopore sequencing (ONT MinION sequencer). By examining environmental samples from the same source, but from locations with different temperatures, bioinformatic analysis revealed dramatic changes in microbial diversity and archaeal abundance. More specifically, no archaeal presence was reported with universal bacterial primers, whereas a significant archaea presence and also a wider variety of bacterial species were reported. These results revealed the significance of primer preference for microbiomes in extreme environments. Bioinformatic analysis was performed by aligning the reads to 16S microbial databases for identification using three different alignment methods, Epi2Me (Fastq 16S workflow), Kraken, and an in-house BLAST tool, including comparison at the genus and species levels. As a result, this approach to data analysis had a significant impact on the genera identified, and thus, it is recommended that use of multiple analysis tools to support findings on taxonomic identification using the 16S region until more precise bioinformatics tools become available. This study presents the first compilation of the ONT-based inventory of the hydrogen producers in the designated hot springs in Türkiye.
Collapse
Affiliation(s)
- İlayda Akaçin
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
| | - Şeymanur Ersoy
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
| | - Osman Doluca
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
- Department of Biomedical Engineering, Faculty of Engineering, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye
| | - Mine Güngörmüşler
- Division of Bioengineering, Graduate School, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye.
- Department of Genetics and Bioengineering, Faculty of Engineering, Izmir University of Economics, Sakarya Caddesi No: 156, 35330, Balçova, Izmir, Türkiye.
| |
Collapse
|
49
|
Liu Z, Zhang X, Huang L, Huo H, Wang P, Li W, Dai H, Yang F, Fu G, Zhao G, Sun YH, Huo J. Long- and short-read RNA sequencing from five reproductive organs of boar. Sci Data 2023; 10:678. [PMID: 37798273 PMCID: PMC10556096 DOI: 10.1038/s41597-023-02595-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 09/25/2023] [Indexed: 10/07/2023] Open
Abstract
The production of semen in boars involves multiple reproductive glands, including the testis (Tes), epididymis (Epi), vesicular gland (VG), prostate gland (PG), and bulbourethral gland (BG). However, previous studies on boar reproduction primarily focused on the testis, with little attention paid to the other glands. Here, we integrated single-molecule long-read sequencing with short-read sequencing to characterize the RNA landscape from five glands of Banna mini-pig inbred line (BMI) and Diannan small-ear pigs (DSE). We identified 110,996 full-length isoforms from 22,298 genes, and classified the alternative splicing (AS) events in these five glands. Transcriptome-wide variation analysis indicated that the number of single nucleotide polymorphisms (SNPs) in five tissues of BMI was significantly lower than that in the non-inbred pig, DSE, revealing the effect of inbreeding on BMI. Additionally, we performed small-RNA sequencing and identified 299 novel miRNAs across all glands. Overall, our findings provide a comprehensive overview of the RNA landscape within these five glands, paving the path for future investigations on reproductive biology and the impact of inbreeding on pig transcriptome.
Collapse
Affiliation(s)
- Zhipeng Liu
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Xia Zhang
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
- College of Life Science, Lyuliang University, Lvliang, 033001, Shanxi, China
| | - Libin Huang
- Department of Biology, College of Science, Northeastern University, Boston, Massachusetts, 02115, USA
| | - Hailong Huo
- Yunnan Open University, Kunming, 650500, Yunnan, China
| | - Pei Wang
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Weizhen Li
- College of Veterinary Medicine, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Hongmei Dai
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Fuhua Yang
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Guowen Fu
- College of Veterinary Medicine, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Guiying Zhao
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China.
| | - Yu H Sun
- Department of Biology, University of Rochester, Rochester, New York, 14627, USA.
| | - Jinlong Huo
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China.
- Department of Biology, University of Rochester, Rochester, New York, 14627, USA.
| |
Collapse
|
50
|
Kainth AS, Haddad GA, Hall JM, Ruthenburg AJ. Merging short and stranded long reads improves transcript assembly. PLoS Comput Biol 2023; 19:e1011576. [PMID: 37883581 PMCID: PMC10629667 DOI: 10.1371/journal.pcbi.1011576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 11/07/2023] [Accepted: 10/05/2023] [Indexed: 10/28/2023] Open
Abstract
Long-read RNA sequencing has arisen as a counterpart to short-read sequencing, with the potential to capture full-length isoforms, albeit at the cost of lower depth. Yet this potential is not fully realized due to inherent limitations of current long-read assembly methods and underdeveloped approaches to integrate short-read data. Here, we critically compare the existing methods and develop a new integrative approach to characterize a particularly challenging pool of low-abundance long noncoding RNA (lncRNA) transcripts from short- and long-read sequencing in two distinct cell lines. Our analysis reveals severe limitations in each of the sequencing platforms. For short-read assemblies, coverage declines at transcript termini resulting in ambiguous ends, and uneven low coverage results in segmentation of a single transcript into multiple transcripts. Conversely, long-read sequencing libraries lack depth and strand-of-origin information in cDNA-based methods, culminating in erroneous assembly and quantitation of transcripts. We also discover a cDNA synthesis artifact in long-read datasets that markedly impacts the identity and quantitation of assembled transcripts. Towards remediating these problems, we develop a computational pipeline to "strand" long-read cDNA libraries that rectifies inaccurate mapping and assembly of long-read transcripts. Leveraging the strengths of each platform and our computational stranding, we also present and benchmark a hybrid assembly approach that drastically increases the sensitivity and accuracy of full-length transcript assembly on the correct strand and improves detection of biological features of the transcriptome. When applied to a challenging set of under-annotated and cell-type variable lncRNA, our method resolves the segmentation problem of short-read sequencing and the depth problem of long-read sequencing, resulting in the assembly of coherent transcripts with precise 5' and 3' ends. Our workflow can be applied to existing datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Collapse
Affiliation(s)
- Amoldeep S. Kainth
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Gabriela A. Haddad
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Johnathon M. Hall
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
| | - Alexander J. Ruthenburg
- Department of Molecular Genetics and Cell Biology, The University of Chicago, Chicago, Illinois, United States of America
- Committee on Genetics, Genomics and Systems Biology, The University of Chicago, Chicago, Illinois, United States of America
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|