1
|
Naghinejad M, Parvizpour S, Khaniani MS, Mehri M, Derakhshan SM, Amirfiroozy A. The known structural variations in hearing loss and their diagnostic approaches: a comprehensive review. Mol Biol Rep 2025; 52:131. [PMID: 39821465 DOI: 10.1007/s11033-025-10231-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 01/07/2025] [Indexed: 01/19/2025]
Abstract
Hearing loss (HL) is the most common sensory disorder, characterized by a wide range of causes, including both environmental and genetic factors. While single-nucleotide variants (SNVs) and small insertions/deletions have been extensively studied, the role of structural variations (SVs) in hearing impairment has gained increasing recognition. This review article aims to provide a comprehensive overview of the importance of SVs in HL, by exploring the SVs associated with HL and their underlying pathogenic mechanisms. Additionally, diagnostic methods of SVs have been briefly evaluated and compared in general. Three major mechanisms by which SVs can lead to HL are gene disruption, gene dosage imbalance, and position effect. Furthermore, to facilitate the detection of SVs in HL, this review presents a table highlighting the key genes and genomic regions implicated in SVs and their diagnostic approaches associated with HL patients. In the next step, indications for the use of SV diagnostic techniques are compiled in another table in this article, which will help experts in choosing the most appropriate technique. At last, the comprehensive review presented here underscores the significant role of SVs in HL. Further research is required to fully elucidate the spectrum of SVs in HL and optimize the clinical use of SV detection methods in routine diagnostic procedures.
Collapse
Affiliation(s)
- Maryam Naghinejad
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Sepideh Parvizpour
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mahmoud Shekari Khaniani
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Maghsood Mehri
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Sima Mansoori Derakhshan
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Akbar Amirfiroozy
- Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
2
|
Smith GJ, van Alen TA, van Kessel MA, Lücker S. Simple, reference-independent assessment to empirically guide correction and polishing of hybrid microbial community metagenomic assembly. PeerJ 2024; 12:e18132. [PMID: 39529629 PMCID: PMC11552494 DOI: 10.7717/peerj.18132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 08/29/2024] [Indexed: 11/16/2024] Open
Abstract
Hybrid metagenomic assembly of microbial communities, leveraging both long- and short-read sequencing technologies, is becoming an increasingly accessible approach, yet its widespread application faces several challenges. High-quality references may not be available for assembly accuracy comparisons common for benchmarking, and certain aspects of hybrid assembly may benefit from dataset-dependent, empiric guidance rather than the application of a uniform approach. In this study, several simple, reference-free characteristics-particularly coding gene content and read recruitment profiles-were hypothesized to be reliable indicators of assembly quality improvement during iterative error-fixing processes. These characteristics were compared to reference-dependent genome- and gene-centric analyses common for microbial community metagenomic studies. Two laboratory-scale bioreactors were sequenced with short- and long-read platforms, and assembled with commonly used software packages. Following long read assembly, long read correction and short read polishing were iterated up to ten times to resolve errors. These iterative processes were shown to have a substantial effect on gene- and genome-centric community compositions. Simple, reference-free assembly characteristics, specifically changes in gene fragmentation and short read recruitment, were robustly correlated with advanced analyses common in published comparative studies, and therefore are suitable proxies for hybrid metagenome assembly quality to simplify the identification of the optimal number of correction and polishing iterations. As hybrid metagenomic sequencing approaches will likely remain relevant due to the low added cost of short-read sequencing for differential coverage binning or the ability to access lower abundance community members, it is imperative that users are equipped to estimate assembly quality prior to downstream analyses.
Collapse
Affiliation(s)
- Garrett J. Smith
- Department of Microbiology, The Ohio State University, Columbus, OH, United States of America
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| | - Theo A. van Alen
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| | - Maartje A.H.J. van Kessel
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| | - Sebastian Lücker
- Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
3
|
McFarlane GR, Polanco JVC, Bogema D. CRISPR-Cas guide RNA indel analysis using CRISPResso2 with Nanopore sequencing data. BMC Res Notes 2024; 17:205. [PMID: 39061110 PMCID: PMC11282726 DOI: 10.1186/s13104-024-06861-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 07/10/2024] [Indexed: 07/28/2024] Open
Abstract
OBJECTIVE Insertion and deletion (indel) analysis of CRISPR-Cas guide RNAs (gRNAs) is crucial in gene editing to assess gRNA efficiency and indel frequency. This study evaluates the utility of CRISPResso2 with Oxford Nanopore sequencing data (nCRISPResso2) for gRNA indel screening, compared to two common Sanger sequencing-based methods, TIDE and ICE. To achieve this, sheep and horse fibroblasts were transfected with Cas9 and a gRNA targeting the myostatin (MSTN) gene. DNA was subsequently extracted, and PCR products exceeding 600 bp were sequenced using both Sanger and Nanopore sequencing. Indel profiling was then conducted using TIDE, ICE, and nCRISPResso2. RESULTS Comparison revealed close correspondence in indel formation among methods. For the sheep MSTN gRNA, indel percentages were 52%, 58%, and 64% for TIDE, ICE, and nCRISPResso2, respectively. Horse MSTN gRNA showed 81%, 87%, and 86% edited amplicons for TIDE, ICE, and nCRISPResso2. The frequency of each type of indel was also comparable among the three methods, with nCRISPResso2 and ICE aligning the closest. nCRISPResso2 offers a viable alternative for CRISPR-Cas gRNA indel screening, especially with large amplicons unsuitable for Illumina sequencing. CRISPResso2's compatibility with Nanopore data enables cost-effective and efficient indel profiling, yielding results comparable to common Sanger sequencing-based methods.
Collapse
Affiliation(s)
- Gus Rowan McFarlane
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, 2568, Australia.
| | - Jenin Victor Cortez Polanco
- Sydney School of Veterinary Science, Faculty of Science, The University of Sydney, Camden, NSW, Australia
- Catalina Stud, North Richmond, NSW, Australia
| | - Daniel Bogema
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, 2568, Australia
| |
Collapse
|
4
|
Singh G, Alser M, Denolf K, Firtina C, Khodamoradi A, Cavlak MB, Corporaal H, Mutlu O. RUBICON: a framework for designing efficient deep learning-based genomic basecallers. Genome Biol 2024; 25:49. [PMID: 38365730 PMCID: PMC10870431 DOI: 10.1186/s13059-024-03181-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 02/02/2024] [Indexed: 02/18/2024] Open
Abstract
Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The performance of basecalling has critical implications for all later steps in genome analysis. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. We present RUBICON, a framework to develop efficient hardware-optimized basecallers. We demonstrate the effectiveness of RUBICON by developing RUBICALL, the first hardware-optimized mixed-precision basecaller that performs efficient basecalling, outperforming the state-of-the-art basecallers. We believe RUBICON offers a promising path to develop future hardware-optimized basecallers.
Collapse
Affiliation(s)
- Gagandeep Singh
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
- Research and Advanced Development, AMD, Longmont, USA
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | | | - Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland.
| | | | - Meryem Banu Cavlak
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
| | - Henk Corporaal
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
5
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
6
|
Stroupe S, Martone C, McCann B, Juras R, Kjöllerström HJ, Raudsepp T, Beard D, Davis BW, Derr JN. Chromosome-level reference genome for North American bison (Bison bison) and variant database aids in identifying albino mutation. G3 (BETHESDA, MD.) 2023; 13:jkad156. [PMID: 37481261 PMCID: PMC10542314 DOI: 10.1093/g3journal/jkad156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/24/2023]
Abstract
We developed a highly contiguous chromosome-level reference genome for North American bison to provide a platform to evaluate the conservation, ecological, evolutionary, and population genomics of this species. Generated from a F1 hybrid between a North American bison dam and a domestic cattle bull, completeness and contiguity exceed that of other published bison genome assemblies. To demonstrate the utility for genome-wide variant frequency estimation, we compiled a genomic variant database consisting of 3 true albino bison and 44 wild-type pelage color bison. Through the examination of genomic variants fixed in the albino cohort and absent in the controls, we identified a nonsynonymous single nucleotide polymorphism (SNP) mutation on chromosome 29 in exon 3 of the tyrosinase gene (c.1114C>T). A TaqMan SNP Genotyping Assay was developed to genotype this SNP in a total of 283 animals across 29 herds. This assay confirmed the absence of homozygous variants in all animals except 7 true albino bison included in this study. In addition, the only heterozygous animals identified were 2 wild-type pelage color dams of albino offspring. Therefore, we propose that this new high-quality bison genome assembly and incipient variant database provides a highly robust and informative resource for genomics investigations for this iconic North American species.
Collapse
Affiliation(s)
- Sam Stroupe
- Department of Veterinary Pathobiology, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| | - Carly Martone
- Department of Veterinary Pathobiology, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| | - Blake McCann
- National Park Service, Theodore Roosevelt National Park, Medora, ND 58645, USA
| | - Rytis Juras
- Department of Veterinary Integrative Biosciences, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| | - Helena Josefina Kjöllerström
- Department of Veterinary Integrative Biosciences, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| | - Terje Raudsepp
- Department of Veterinary Integrative Biosciences, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| | - Donald Beard
- Texas Parks and Wildlife, Caprock Canyons State Park & Trailway, Quitaque, TX 79255, USA
| | - Brian W Davis
- Department of Veterinary Integrative Biosciences, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
- Department of Small Animal Clinical Sciences, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| | - James N Derr
- Department of Veterinary Pathobiology, Texas A&M University School of Veterinary Medicine and Biomedical Science, College Station, TX 77843, USA
| |
Collapse
|
7
|
Mestre-Tomás J, Liu T, Pardo-Palacios F, Conesa A. SQANTI-SIM: a simulator of controlled transcript novelty for lrRNA-seq benchmark. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554392. [PMID: 37662216 PMCID: PMC10473693 DOI: 10.1101/2023.08.23.554392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Long-read RNA-seq has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile utility that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field. We demonstrate the effectiveness of SQANTI-SIM by benchmarking five transcriptome reconstruction pipelines using the simulated data.
Collapse
Affiliation(s)
- Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Tianyuan Liu
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Francisco Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council, Catedràtic Agustín Escardino Benlloch, Paterna, 46980, Spain
| |
Collapse
|
8
|
Pardo-Palacios FJ, Arzalluz-Luque A, Kondratova L, Salguero P, Mestre-Tomás J, Amorín R, Estevan-Morió E, Liu T, Nanni A, McIntyre L, Tseng E, Conesa A. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.17.541248. [PMID: 37398077 PMCID: PMC10312485 DOI: 10.1101/2023.05.17.541248] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
The emergence of long-read RNA sequencing (lrRNA-seq) has provided an unprecedented opportunity to analyze transcriptomes at isoform resolution. However, the technology is not free from biases, and transcript models inferred from these data require quality control and curation. In this study, we introduce SQANTI3, a tool specifically designed to perform quality analysis on transcriptomes constructed using lrRNA-seq data. SQANTI3 provides an extensive naming framework to describe transcript model diversity in comparison to the reference transcriptome. Additionally, the tool incorporates a wide range of metrics to characterize various structural properties of transcript models, such as transcription start and end sites, splice junctions, and other structural features. These metrics can be utilized to filter out potential artifacts. Moreover, SQANTI3 includes a Rescue module that prevents the loss of known genes and transcripts exhibiting evidence of expression but displaying low-quality features. Lastly, SQANTI3 incorporates IsoAnnotLite, which enables functional annotation at the isoform level and facilitates functional iso-transcriptomics analyses. We demonstrate the versatility of SQANTI3 in analyzing different data types, isoform reconstruction pipelines, and sequencing platforms, and how it provides novel biological insights into isoform biology. The SQANTI3 software is available at https://github.com/ConesaLab/SQANTI3 .
Collapse
|
9
|
Yang C, Lo T, Nip KM, Hafezqorani S, Warren RL, Birol I. Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim. Gigascience 2023; 12:giad013. [PMID: 36939007 PMCID: PMC10025935 DOI: 10.1093/gigascience/giad013] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 01/19/2023] [Accepted: 02/17/2023] [Indexed: 03/21/2023] Open
Abstract
BACKGROUND Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.
Collapse
Affiliation(s)
- Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Theodora Lo
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Life Sciences Centre Room 1364 – 2350 Health Science Mall Vancouver, BC V6T 1Z3, Canada
| |
Collapse
|
10
|
Wu T, Deng G, Yin Q, Chen S, Zhang Y, Wang B, Xiang L, Liu X. Characterization and molecular evolution analysis of Periploca forrestii inferred from its complete chloroplast genome sequence. Genome 2023; 66:34-50. [PMID: 36516428 DOI: 10.1139/gen-2022-0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Periploca forrestii, a medicinal plant of the family Apocynaceae, is known as an effective and widely used clinical prescription for the treatment of rheumatoid diseases. In this study, we de novo sequenced and assembled the completement chloroplast (cp) genome of P. forrestii based on combined Oxford Nanopore PromethION and Illumina data. The cp genome was 153 724 bp in length and had four subregions. Moreover, an 84 433 bp large single-copy and a 17 731 bp small single-copy were separated by 25 780 bp inverted repeats (IRs). The cp genome included 132 genes with 18 duplicates in the IRs. A total of 45 repeat structures and 183 simple sequence repeats were detected. Codon usage showed a bias toward A/T-ending codons. A comparative study of Apocynaceae revealed that an IR expansion occurred on P. forrestii. The Ka/Ks values of eight species of Apocynaceae suggested that positive selection was exerted on the psaI and ycf2 genes, which might reflect specific adaptions to the P. forrestii particular growth environment. Phylogenetic analysis indicated that Periplocoideae was a sister to Asclepiadoideae, forming a monophyletic group in the family Apocynaceae. This study provided an important P. forrestii genomic resource for future evolutionary studies and the phylogenetic reconstruction of the family Apocynaceae.
Collapse
Affiliation(s)
- Tianze Wu
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, Chinese Academy of Medical Sciences, Beijing 100700, China.,School of Chemistry, Chemical Engineering and Life Sciences, Wuhan University of Technology, Wuhan 430070, China
| | - Gang Deng
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, Chinese Academy of Medical Sciences, Beijing 100700, China.,School of Chemistry, Chemical Engineering and Life Sciences, Wuhan University of Technology, Wuhan 430070, China
| | - Qinggang Yin
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, Chinese Academy of Medical Sciences, Beijing 100700, China
| | - Shilin Chen
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, Chinese Academy of Medical Sciences, Beijing 100700, China.,School of Chemistry, Chemical Engineering and Life Sciences, Wuhan University of Technology, Wuhan 430070, China
| | - Yongping Zhang
- National Engineering Technology Research Center for Miao Medicine, College of Pharmaceutical Sciences, Guizhou University of Traditional Chinese Medicine, Guiyang 550025, Guizhou, China
| | - Bo Wang
- National Engineering Technology Research Center for Miao Medicine, College of Pharmaceutical Sciences, Guizhou University of Traditional Chinese Medicine, Guiyang 550025, Guizhou, China
| | - Li Xiang
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, Chinese Academy of Medical Sciences, Beijing 100700, China
| | - Xia Liu
- School of Chemistry, Chemical Engineering and Life Sciences, Wuhan University of Technology, Wuhan 430070, China
| |
Collapse
|
11
|
Kress A, Poch O, Lecompte O, Thompson JD. Real or fake? Measuring the impact of protein annotation errors on estimates of domain gain and loss events. FRONTIERS IN BIOINFORMATICS 2023; 3:1178926. [PMID: 37151482 PMCID: PMC10158824 DOI: 10.3389/fbinf.2023.1178926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Protein annotation errors can have significant consequences in a wide range of fields, ranging from protein structure and function prediction to biomedical research, drug discovery, and biotechnology. By comparing the domains of different proteins, scientists can identify common domains, classify proteins based on their domain architecture, and highlight proteins that have evolved differently in one or more species or clades. However, genome-wide identification of different protein domain architectures involves a complex error-prone pipeline that includes genome sequencing, prediction of gene exon/intron structures, and inference of protein sequences and domain annotations. Here we developed an automated fact-checking approach to distinguish true domain loss/gain events from false events caused by errors that occur during the annotation process. Using genome-wide ortholog sets and taking advantage of the high-quality human and Saccharomyces cerevisiae genome annotations, we analyzed the domain gain and loss events in the predicted proteomes of 9 non-human primates (NHP) and 20 non-S. cerevisiae fungi (NSF) as annotated in the Uniprot and Interpro databases. Our approach allowed us to quantify the impact of errors on estimates of protein domain gains and losses, and we show that domain losses are over-estimated ten-fold and three-fold in the NHP and NSF proteins respectively. This is in line with previous studies of gene-level losses, where issues with genome sequencing or gene annotation led to genes being falsely inferred as absent. In addition, we show that insistent protein domain annotations are a major factor contributing to the false events. For the first time, to our knowledge, we show that domain gains are also over-estimated by three-fold and two-fold respectively in NHP and NSF proteins. Based on our more accurate estimates, we infer that true domain losses and gains in NHP with respect to humans are observed at similar rates, while domain gains in the more divergent NSF are observed twice as frequently as domain losses with respect to S. cerevisiae. This study highlights the need to critically examine the scientific validity of protein annotations, and represents a significant step toward scalable computational fact-checking methods that may 1 day mitigate the propagation of wrong information in protein databases.
Collapse
|
12
|
Walter M, Puniamoorthy N. Discovering novel reproductive genes in a non-model fly using de novo GridION transcriptomics. Front Genet 2022; 13:1003771. [PMID: 36568389 PMCID: PMC9768217 DOI: 10.3389/fgene.2022.1003771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/16/2022] [Indexed: 12/12/2022] Open
Abstract
Gene discovery has important implications for investigating phenotypic trait evolution, adaptation, and speciation. Male reproductive tissues, such as accessory glands (AGs), are hotspots for recruitment of novel genes that diverge rapidly even among closely related species/populations. These genes synthesize seminal fluid proteins that often affect post-copulatory sexual selection-they can mediate male-male sperm competition, ejaculate-female interactions that modify female remating and even influence reproductive incompatibilities among diverging species/populations. Although de novo transcriptomics has facilitated gene discovery in non-model organisms, reproductive gene discovery is still challenging without a reference database as they are often novel and bear no homology to known proteins. Here, we use reference-free GridION long-read transcriptomics, from Oxford Nanopore Technologies (ONT), to discover novel AG genes and characterize their expression in the widespread dung fly, Sepsis punctum. Despite stark population differences in male reproductive traits (e.g.: Body size, testes size, and sperm length) as well as female re-mating, the male AG genes and their secretions of S. punctum are still unknown. We implement a de novo ONT transcriptome pipeline incorporating quality-filtering and rigorous error-correction procedures, and we evaluate gene sequence and gene expression results against high-quality Illumina short-read data. We discover highly-expressed reproductive genes in AG transcriptomes of S. punctum consisting of 40 high-quality and high-confidence ONT genes that cross-verify against Illumina genes, among which 26 are novel and specific to S. punctum. Novel genes account for an average of 81% of total gene expression and may be functionally relevant in seminal fluid protein production. For instance, 80% of genes encoding secretory proteins account for 74% total gene expression. In addition, median sequence similarities of ONT nucleotide and protein sequences match within-Illumina sequence similarities. Read-count based expression quantification in ONT is congruent with Illumina's Transcript per Million (TPM), both in overall pattern and within functional categories. Rapid genomic innovation followed by recruitment of de novo genes for high expression in S. punctum AG tissue, a pattern observed in other insects, could be a likely mechanism of evolution of these genes. The study also demonstrates the feasibility of adapting ONT transcriptomics for gene discovery in non-model systems.
Collapse
|
13
|
Tshiabuila D, Giandhari J, Pillay S, Ramphal U, Ramphal Y, Maharaj A, Anyaneji UJ, Naidoo Y, Tegally H, San EJ, Wilkinson E, Lessells RJ, de Oliveira T. Comparison of SARS-CoV-2 sequencing using the ONT GridION and the Illumina MiSeq. BMC Genomics 2022; 23:319. [PMID: 35459088 PMCID: PMC9026045 DOI: 10.1186/s12864-022-08541-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 04/08/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Over 4 million SARS-CoV-2 genomes have been sequenced globally in the past 2 years. This has been crucial in elucidating transmission chains within communities, the development of new diagnostic methods, vaccines, and antivirals. Although several sequencing technologies have been employed, Illumina and Oxford Nanopore remain the two most commonly used platforms. The sequence quality between these two platforms warrants a comparison of the genomes produced by the two technologies. Here, we compared the SARS-CoV-2 consensus genomes obtained from the Oxford Nanopore Technology GridION and the Illumina MiSeq for 28 sequencing runs. RESULTS Our results show that the MiSeq had a significantly higher number of consensus genomes classified by Nextclade as good and mediocre compared to the GridION. The MiSeq also had a significantly higher genome coverage and mutation counts than the GridION. CONCLUSION Due to the low genome coverage, high number of indels, and sensitivity to SARS-CoV-2 viral load noted with the GridION when compared to MiSeq, we can conclude that the MiSeq is more favourable for SARS-CoV-2 genomic surveillance, as successful genomic surveillance is dependent on high quality, near-whole consensus genomes.
Collapse
Affiliation(s)
- Derek Tshiabuila
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa.
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa.
| | - Jennifer Giandhari
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Sureshnee Pillay
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Upasana Ramphal
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa
| | - Yajna Ramphal
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Arisha Maharaj
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Ugochukwu Jacob Anyaneji
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Yeshnee Naidoo
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Houriiyah Tegally
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Emmanuel James San
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Eduan Wilkinson
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Richard J Lessells
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
| | - Tulio de Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu-Natal Durban 4001, KwaZulu-Natal, South Africa
- Centre for Epidemic Response and Innovation (CERI), Stellenbosch University, Stellenbosch, South Africa
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa
- Department of Global Health, University of Washington, Seattle, WA, USA
| |
Collapse
|
14
|
Ding Q, Li R, Ren X, Chan LY, Ho VWS, Xie D, Ye P, Zhao Z. Genomic architecture of 5S rDNA cluster and its variations within and between species. BMC Genomics 2022; 23:238. [PMID: 35346033 PMCID: PMC8961926 DOI: 10.1186/s12864-022-08476-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/16/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ribosomal DNAs (rDNAs) are arranged in purely tandem repeats, preventing them from being reliably assembled onto chromosomes during generation of genome assembly. The uncertainty of rDNA genomic structure presents a significant barrier for studying their function and evolution. RESULTS Here we generate ultra-long Oxford Nanopore Technologies (ONT) and short NGS reads to delineate the architecture and variation of the 5S rDNA cluster in the different strains of C. elegans and C. briggsae. We classify the individual rDNA's repeating units into 25 types based on the unique sequence variations in each unit of C. elegans (N2). We next perform assembly of the cluster by taking advantage of the long reads that carry these units, which led to an assembly of 5S rDNA cluster consisting of up to 167 consecutive 5S rDNA units in the N2 strain. The ordering and copy number of various rDNA units are consistent with the separation time between strains. Surprisingly, we observed a drastically reduced level of variation in the unit composition in the 5S rDNA cluster in the C. elegans CB4856 and C. briggsae AF16 strains than in the C. elegans N2 strain, suggesting that N2, a widely used reference strain, is likely to be defective in maintaining the 5S rDNA cluster stability compared with other wild isolates of C. elegans or C. briggsae. CONCLUSIONS The results demonstrate that Nanopore DNA sequencing reads are capable of generating assembly of highly repetitive sequences, and rDNA units are highly dynamic both within and between population(s) of the same species in terms of sequence and copy number. The detailed structure and variation of the 5S rDNA units within the rDNA cluster pave the way for functional and evolutionary studies.
Collapse
Affiliation(s)
- Qiutao Ding
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
| | - Runsheng Li
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
- Department of Infectious Diseases and Public Health, City University of Hong Kong, Hong Kong SAR, China
| | - Xiaoliang Ren
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
| | - Lu-Yan Chan
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
| | - Vincy W S Ho
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
| | - Dongying Xie
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
| | - Pohao Ye
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China
| | - Zhongying Zhao
- Department of Biology, Hong Kong Baptist University, Hong Kong SAR, China.
- State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong SAR, China.
| |
Collapse
|
15
|
Logan R, Fleischmann Z, Annis S, Wehe AW, Tilly JL, Woods DC, Khrapko K. 3GOLD: optimized Levenshtein distance for clustering third-generation sequencing data. BMC Bioinformatics 2022; 23:95. [PMID: 35307007 PMCID: PMC8934446 DOI: 10.1186/s12859-022-04637-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 03/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Third-generation sequencing offers some advantages over next-generation sequencing predecessors, but with the caveat of harboring a much higher error rate. Clustering-related sequences is an essential task in modern biology. To accurately cluster sequences rich in errors, error type and frequency need to be accounted for. Levenshtein distance is a well-established mathematical algorithm for measuring the edit distance between words and can specifically weight insertions, deletions and substitutions. However, there are drawbacks to using Levenshtein distance in a biological context and hence has rarely been used for this purpose. We present novel modifications to the Levenshtein distance algorithm to optimize it for clustering error-rich biological sequencing data. RESULTS We successfully introduced a bidirectional frameshift allowance with end-user determined accommodation caps combined with weighted error discrimination. Furthermore, our modifications dramatically improved the computational speed of Levenstein distance. For simulated ONT MinION and PacBio Sequel datasets, the average clustering sensitivity for 3GOLD was 41.45% (S.D. 10.39) higher than Sequence-Levenstein distance, 52.14% (S.D. 9.43) higher than Levenshtein distance, 55.93% (S.D. 8.67) higher than Starcode, 42.68% (S.D. 8.09) higher than CD-HIT-EST and 61.49% (S.D. 7.81) higher than DNACLUST. For biological ONT MinION data, 3GOLD clustering sensitivity was 27.99% higher than Sequence-Levenstein distance, 52.76% higher than Levenshtein distance, 56.39% higher than Starcode, 48% higher than CD-HIT-EST and 70.4% higher than DNACLUST. CONCLUSION Our modifications to Levenshtein distance have improved its speed and accuracy compared to the classic Levenshtein distance, Sequence-Levenshtein distance and other commonly used clustering approaches on simulated and biological third-generation sequenced datasets. Our clustering approach is appropriate for datasets of unknown cluster centroids, such as those generated with unique molecular identifiers as well as known centroids such as barcoded datasets. A strength of our approach is high accuracy in resolving small clusters and mitigating the number of singletons.
Collapse
Affiliation(s)
- Robert Logan
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA.,Department of Biology, Eastern Nazarene College, 23 E Elm Ave, Quincy, MA, 02170, USA
| | - Zoe Fleischmann
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Sofia Annis
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Amy Wangsness Wehe
- Health and Natural Sciences Division, Mathematics Department, Fitchburg State University, Fitchburg, MA, 01420-2697, USA
| | - Jonathan L Tilly
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Dori C Woods
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA
| | - Konstantin Khrapko
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA, 02115, USA.
| |
Collapse
|
16
|
Liu Y, Kearney J, Mahmoud M, Kille B, Sedlazeck FJ, Treangen TJ. Rescuing low frequency variants within intra-host viral populations directly from Oxford Nanopore sequencing data. Nat Commun 2022; 13:1321. [PMID: 35288552 PMCID: PMC8921239 DOI: 10.1038/s41467-022-28852-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 02/10/2022] [Indexed: 12/28/2022] Open
Abstract
Infectious disease monitoring on Oxford Nanopore Technologies (ONT) platforms offers rapid turnaround times and low cost. Tracking low frequency intra-host variants provides important insights with respect to elucidating within-host viral population dynamics and transmission. However, given the higher error rate of ONT, accurate identification of intra-host variants with low allele frequencies remains an open challenge with no viable computational solutions available. In response to this need, we present Variabel, a novel approach and first method designed for rescuing low frequency intra-host variants from ONT data alone. We evaluate Variabel on both synthetic data (SARS-CoV-2) and patient derived datasets (Ebola virus, norovirus, SARS-CoV-2); our results show that Variabel can accurately identify low frequency variants below 0.5 allele frequency, outperforming existing state-of-the-art ONT variant callers for this task. Variabel is open-source and available for download at: www.gitlab.com/treangenlab/variabel. Tracking low frequency intra-host variants has helped understanding within-host viral population dynamics and transmission. Precise tracking, however, depends partially on the error rate of the sequencing platforms used. Here, Liu et al. present Variabel, a method to rescue low frequency intra-host variants from Oxford Nanopore Technologies (ONT) platforms and validate their approach on Ebola virus, norovirus, and SARS-CoV-2 datasets.
Collapse
|
17
|
Ergin S, Kherad N, Alagoz M. RNA sequencing and its applications in cancer and rare diseases. Mol Biol Rep 2022; 49:2325-2333. [PMID: 34988891 PMCID: PMC8731134 DOI: 10.1007/s11033-021-06963-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 11/16/2021] [Indexed: 12/19/2022]
Abstract
With the invention of RNA sequencing over a decade ago, diagnosis and identification of the gene-related diseases entered a new phase that enabled more accurate analysis of the diseases that are difficult to approach and analyze. RNA sequencing has availed in-depth study of transcriptomes in different species and provided better understanding of rare diseases and taxonomical classifications of various eukaryotic organisms. Development of single-cell, short-read, long-read and direct RNA sequencing using both blood and biopsy specimens of the organism together with recent advancement in computational analysis programs has made the medical professional's ability in identifying the origin and cause of genetic disorders indispensable. Altogether, such advantages have evolved the treatment design since RNA sequencing can detect the resistant genes against the existing therapies and help medical professions to take a further step in improving methods of treatments towards higher effectiveness and less side effects. Therefore, it is of essence to all researchers and scientists to have deeper insight in all available methods of RNA sequencing while taking a step-in therapy design.
Collapse
Affiliation(s)
- Selvi Ergin
- Department of Molecular Biology and Genetics, Biruni University, Istanbul, Turkey
| | - Nasim Kherad
- Department of Molecular Biology and Genetics, Biruni University, Istanbul, Turkey
| | - Meryem Alagoz
- Department of Molecular Biology and Genetics, Biruni University, Istanbul, Turkey.
| |
Collapse
|
18
|
Charnaud S, Munro JE, Semenec L, Mazhari R, Brewster J, Bourke C, Ruybal-Pesántez S, James R, Lautu-Gumal D, Karunajeewa H, Mueller I, Bahlo M. PacBio long-read amplicon sequencing enables scalable high-resolution population allele typing of the complex CYP2D6 locus. Commun Biol 2022; 5:168. [PMID: 35217695 PMCID: PMC8881578 DOI: 10.1038/s42003-022-03102-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 02/01/2022] [Indexed: 01/31/2023] Open
Abstract
The CYP2D6 enzyme is estimated to metabolize 25% of commonly used pharmaceuticals and is of intense pharmacogenetic interest due to the polymorphic nature of the CYP2D6 gene. Accurate allele typing of CYP2D6 has proved challenging due to frequent copy number variants (CNVs) and paralogous pseudogenes. SNP-arrays, qPCR and short-read sequencing have been employed to interrogate CYP2D6, however these technologies are unable to capture longer range information. Long-read sequencing using the PacBio Single Molecule Real Time (SMRT) sequencing platform has yielded promising results for CYP2D6 allele typing. However, previous studies have been limited in scale and have employed nascent data processing pipelines. We present a robust data processing pipeline "PLASTER" for accurate allele typing of SMRT sequenced amplicons. We demonstrate the pipeline by typing CYP2D6 alleles in a large cohort of 377 Solomon Islanders. This pharmacogenetic method will improve drug safety and efficacy through screening prior to drug administration.
Collapse
Affiliation(s)
- Sarah Charnaud
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Jacob E. Munro
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Lucie Semenec
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia ,grid.1004.50000 0001 2158 5405ARC Centre of Excellence in Synthetic Biology, Department of Molecular Sciences, Macquarie University, Sydney, NSW Australia
| | - Ramin Mazhari
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Jessica Brewster
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Caitlin Bourke
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Shazia Ruybal-Pesántez
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia ,grid.1056.20000 0001 2224 8486Burnet Institute, Melbourne, VIC Australia
| | - Robert James
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Dulcie Lautu-Gumal
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Harin Karunajeewa
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Ivo Mueller
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| | - Melanie Bahlo
- grid.1042.70000 0004 0432 4889Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC Australia ,grid.1008.90000 0001 2179 088XDepartment of Medical Biology, University of Melbourne, Melbourne, VIC Australia
| |
Collapse
|
19
|
Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]
Abstract
In the past several years, nanopore sequencing technology from Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing technology from Pacific BioSciences (PacBio) have become available to researchers and are currently being tested for cancer research. These methods offer many advantages over most widely used high-throughput short-read sequencing approaches and allow the comprehensive analysis of transcriptomes by identifying full-length splice isoforms and several other posttranscriptional events. In addition, these platforms enable structural variation characterization at a previously unparalleled resolution and direct detection of epigenetic marks in native DNA and RNA. Here, we present a comprehensive summary of important applications of these technologies in cancer research, including the identification of complex structure variants, alternatively spliced isoforms, fusion transcript events, and exogenous RNA. Furthermore, we discuss the impact of the newly developed nanopore direct RNA sequencing (RNA-Seq) approach in advancing epitranscriptome research in cancer. Although the unique challenges still present for these new single-molecule long-read methods, they will unravel many aspects of cancer genome complexity in unprecedented ways and present an encouraging outlook for continued application in an increasing number of different cancer research settings.
Collapse
Affiliation(s)
- Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| |
Collapse
|
20
|
Cheng Y, Grueber C, Hogg CJ, Belov K. Improved high-throughput MHC typing for non-model species using long-read sequencing. Mol Ecol Resour 2021; 22:862-876. [PMID: 34551192 PMCID: PMC9293008 DOI: 10.1111/1755-0998.13511] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 08/26/2021] [Accepted: 09/06/2021] [Indexed: 11/29/2022]
Abstract
The major histocompatibility complex (MHC) plays a critical role in the vertebrate immune system. Accurate MHC typing is critical to understanding not only host fitness and disease susceptibility, but also the mechanisms underlying host‐pathogen co‐evolution. However, due to the high degree of gene duplication and diversification of MHC genes, it is often technically challenging to accurately characterise MHC genetic diversity in non‐model species. Here we conducted a systematic review to identify common issues associated with current widely used MHC typing approaches. Then to overcome these challenges, we developed a long‐read based MHC typing method along with a new analysis pipeline. Our approach enables the sequencing of fully phased MHC alleles spanning all key functional domains and the separation of highly similar alleles as well as the removal of technical artefacts such as PCR heteroduplexes and chimeras. Using this approach, we performed population‐scale MHC typing in the Tasmanian devil (Sarcophilus harrisii), revealing previously undiscovered MHC functional diversity in this endangered species. Our new method provides a better solution for addressing research questions that require high MHC typing accuracy. Since the method is not limited by species or the number of genes analysed, it will be applicable for studying not only the MHC but also other complex gene families.
Collapse
Affiliation(s)
- Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Catherine Grueber
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Carolyn J Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia.,San Diego Zoo Wildlife Alliance, San Diego, California, USA
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
21
|
He Y, Yu H, Zhao H, Zhu H, Zhang Q, Wang A, Shen Y, Xu X, Li J. Transcriptomic analysis to elucidate the effects of high stocking density on grass carp (Ctenopharyngodon idella). BMC Genomics 2021; 22:620. [PMID: 34399686 PMCID: PMC8369720 DOI: 10.1186/s12864-021-07924-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 08/06/2021] [Indexed: 01/23/2023] Open
Abstract
Background Grass carp (Ctenopharyngodon idella) is one of the most widely cultivated fishes in China. High stocking density can reportedly affect fish growth and immunity. Herein we performed PacBio long-read single-molecule real-time (SMRT) sequencing and Illumina RNA sequencing to evaluate the effects of high stocking density on grass carp transcriptome. Results SMRT sequencing led to the identification of 33,773 genes (14,946 known and 18,827 new genes). From the structure analysis, 8,009 genes were detected with alternative splicing events, 10,219 genes showed alternative polyadenylation sites and 15,521 long noncoding RNAs. Further, 1,235, 962, and 213 differentially expressed genes (DEGs) were identified in the intestine, muscle, and brain tissues, respectively. We performed functional enrichment analyses of DEGs, and they were identified to be significantly enriched in nutrient metabolism and immune function. The expression levels of several genes encoding apolipoproteins and activities of enzymes involved in carbohydrate enzymolysis were found to be upregulated in the high stocking density group, indicating that lipid metabolism and carbohydrate decomposition were accelerated. Besides, four isoforms of grass carp major histocompatibility complex class II antigen alpha and beta chains in the aforementioned three tissue was showed at least a 4-fold decrease. Conclusions The results suggesting that fish farmed at high stocking densities face issues associated with the metabolism and immune system. To conclude, our results emphasize the importance of maintaining reasonable density in grass carp aquaculture. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07924-4.
Collapse
Affiliation(s)
- Yan He
- Key Laboratory of Freshwater Aquatic Genetic Resources Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, China.,National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, China.,Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai, China
| | - Hongyan Yu
- Key Laboratory of Freshwater Aquatic Genetic Resources Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, China.,National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, China
| | - Honggang Zhao
- Department of Natural Resources, Cornell University, 14853, Ithaca, New York, USA
| | - Hua Zhu
- Beijing Key Laboratory of Fishery Biotechnology, Beijing Fisheries Research Institute, 100068, Beijing, China
| | - Qingjing Zhang
- Beijing Key Laboratory of Fishery Biotechnology, Beijing Fisheries Research Institute, 100068, Beijing, China
| | - Anqi Wang
- Key Laboratory of Freshwater Aquatic Genetic Resources Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, China
| | - Yubang Shen
- National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, China
| | - Xiaoyan Xu
- Key Laboratory of Freshwater Aquatic Genetic Resources Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, China. .,National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, China. .,Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai, China.
| | - Jiale Li
- Key Laboratory of Freshwater Aquatic Genetic Resources Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, China. .,National Demonstration Center for Experimental Fisheries Science Education, Shanghai Ocean University, Shanghai, China. .,Shanghai Engineering Research Center of Aquaculture, Shanghai Ocean University, Shanghai, China.
| |
Collapse
|
22
|
Allan FK, Jayaraman S, Paxton E, Sindoya E, Kibona T, Fyumagwa R, Mramba F, Torr SJ, Hemmink JD, Toye P, Lembo T, Handel I, Auty HK, Morrison WI, Morrison LJ. Antigenic Diversity in Theileria parva Populations From Sympatric Cattle and African Buffalo Analyzed Using Long Read Sequencing. Front Genet 2021; 12:684127. [PMID: 34335691 PMCID: PMC8320539 DOI: 10.3389/fgene.2021.684127] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 05/24/2021] [Indexed: 11/17/2022] Open
Abstract
East Coast fever (ECF) in cattle is caused by the Apicomplexan protozoan parasite Theileria parva, transmitted by the three-host tick Rhipicephalus appendiculatus. The African buffalo (Syncerus caffer) is the natural host for T. parva but does not suffer disease, whereas ECF is often fatal in cattle. The genetic relationship between T. parva populations circulating in cattle and buffalo is poorly understood, and has not been studied in sympatric buffalo and cattle. This study aimed to determine the genetic diversity of T. parva populations in cattle and buffalo, in an area where livestock co-exist with buffalo adjacent to the Serengeti National Park, Tanzania. Three T. parva antigens (Tp1, Tp4, and Tp16), known to be recognized by CD8+ and CD4+ T cells in immunized cattle, were used to characterize genetic diversity of T. parva in cattle (n = 126) and buffalo samples (n = 22). Long read (PacBio) sequencing was used to generate full or near-full length allelic sequences. Patterns of diversity were similar across all three antigens, with allelic diversity being significantly greater in buffalo-derived parasites compared to cattle-derived (e.g., for Tp1 median cattle allele count was 9, and 81.5 for buffalo), with very few alleles shared between species (8 of 651 alleles were shared for Tp1). Most alleles were unique to buffalo with a smaller proportion unique to cattle (412 buffalo unique vs. 231 cattle-unique for Tp1). There were indications of population substructuring, with one allelic cluster of Tp1 representing alleles found in both cattle and buffalo (including the TpM reference genome allele), and another containing predominantly only alleles deriving from buffalo. These data illustrate the complex interplay between T. parva populations in buffalo and cattle, revealing the significant genetic diversity in the buffalo T. parva population, the limited sharing of parasite genotypes between the host species, and highlight that a subpopulation of T. parva is maintained by transmission within cattle. The data indicate that fuller understanding of buffalo T. parva population dynamics is needed, as only a comprehensive appreciation of the population genetics of T. parva populations will enable assessment of buffalo-derived infection risk in cattle, and how this may impact upon control measures such as vaccination.
Collapse
Affiliation(s)
- Fiona K. Allan
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Siddharth Jayaraman
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Edith Paxton
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Emmanuel Sindoya
- Ministry of Livestock and Fisheries, Serengeti District Livestock Office, Mugumu, Tanzania
| | - Tito Kibona
- Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania
| | | | - Furaha Mramba
- Vector and Vector-Borne Diseases Research Institute, Tanga, Tanzania
| | - Stephen J. Torr
- Liverpool School of Tropical Medicine, Liverpool, United Kingdom
| | - Johanneke D. Hemmink
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
- International Livestock Research Institute, Nairobi, Kenya
| | - Philip Toye
- International Livestock Research Institute, Nairobi, Kenya
| | - Tiziana Lembo
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Ian Handel
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Harriet K. Auty
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - W. Ivan Morrison
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Liam J. Morrison
- Royal (Dick) School of Veterinary Studies, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
23
|
de la Fuente L, Arzalluz-Luque Á, Tardáguila M, Del Risco H, Martí C, Tarazona S, Salguero P, Scott R, Lerma A, Alastrue-Agudo A, Bonilla P, Newman JRB, Kosugi S, McIntyre LM, Moreno-Manzano V, Conesa A. tappAS: a comprehensive computational framework for the analysis of the functional impact of differential splicing. Genome Biol 2020; 21:119. [PMID: 32423416 PMCID: PMC7236505 DOI: 10.1186/s13059-020-02028-w] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 04/23/2020] [Indexed: 12/26/2022] Open
Abstract
Recent advances in long-read sequencing solve inaccuracies in alternative transcript identification of full-length transcripts in short-read RNA-Seq data, which encourages the development of methods for isoform-centered functional analysis. Here, we present tappAS, the first framework to enable a comprehensive Functional Iso-Transcriptomics (FIT) analysis, which is effective at revealing the functional impact of context-specific post-transcriptional regulation. tappAS uses isoform-resolved annotation of coding and non-coding functional domains, motifs, and sites, in combination with novel analysis methods to interrogate different aspects of the functional readout of transcript variants and isoform regulation. tappAS software and documentation are available at https://app.tappas.org.
Collapse
Affiliation(s)
- Lorena de la Fuente
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
- Present Address: Bioinformatics Unit, IIS Fundación Jiménez Díaz, Madrid, Spain
| | - Ángeles Arzalluz-Luque
- Department of Statistics and Operational Research, Polytechnical University of Valencia, Valencia, Spain
| | - Manuel Tardáguila
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA
- Present Address: Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Héctor Del Risco
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA
| | - Cristina Martí
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
| | - Sonia Tarazona
- Department of Statistics and Operational Research, Polytechnical University of Valencia, Valencia, Spain
| | - Pedro Salguero
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
| | - Raymond Scott
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA
| | - Alberto Lerma
- Genomics of Gene Expression Laboratory, Prince Felipe Research Center, Valencia, Spain
| | - Ana Alastrue-Agudo
- Present Address: Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Pablo Bonilla
- Present Address: Human Genetics Department, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Jeremy R B Newman
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Pathology, University of Florida, Gainesville, FL, USA
| | - Shunichi Kosugi
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Laboratory for Statistical and Translational Genetics, Center for Integrative Medical Sciences, RIKEN, Wako, Japan
| | - Lauren M McIntyre
- Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA
| | | | - Ana Conesa
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, USA.
- Genetics Institute, University of Florida, Gainesville, FL, USA.
| |
Collapse
|
24
|
Comparative analyses of error handling strategies for next-generation sequencing in precision medicine. Sci Rep 2020; 10:5750. [PMID: 32238883 PMCID: PMC7113248 DOI: 10.1038/s41598-020-62675-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 03/18/2020] [Indexed: 11/21/2022] Open
Abstract
Next-generation sequencing (NGS) offers the opportunity to sequence millions and billions of DNA sequences in a short period, leading to novel applications in personalized medicine, such as cancer diagnostics or antiviral therapy. Nevertheless, sequencing technologies have different error rates, which occur during the sequencing process. If the NGS data is used for diagnostics, these sequences with errors are typically neglected or a worst-case scenario is assumed. In the current study, we focused on the impact of ambiguous bases on therapy recommendations for Human Immunodeficiency Virus 1 (HIV-1) patients. Concretely, we analyzed the treatment recommendation with entry blockers based on prediction models for co-receptor tropism. We compared three different error handling strategies that have been used in the literature, namely (i) neglection, (ii) worst-case assumption, and (iii) deconvolution with a majority vote. We could show that for two or more ambiguous positions per sequence a reliable prediction is generally no longer possible. Moreover, also the position of ambiguity plays a crucial role. Thus, we analyzed the error probability distributions of existing sequencing technologies, e.g., Illumina MiSeq or PacBio, with respect to the aforementioned error handling strategies and it turned out that neglection outperforms the other strategies in the case where no systematic errors are present. In other cases, the deconvolution strategy with the majority vote should be preferred.
Collapse
|
25
|
Robinson EK, Covarrubias S, Carpenter S. The how and why of lncRNA function: An innate immune perspective. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194419. [PMID: 31487549 PMCID: PMC7185634 DOI: 10.1016/j.bbagrm.2019.194419] [Citation(s) in RCA: 209] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 08/21/2019] [Indexed: 02/06/2023]
Abstract
Next-generation sequencing has provided a more complete picture of the composition of the human transcriptome indicating that much of the "blueprint" is a vastness of poorly understood non-protein-coding transcripts. This includes a newly identified class of genes called long noncoding RNAs (lncRNAs). The lack of sequence conservation for lncRNAs across species meant that their biological importance was initially met with some skepticism. LncRNAs mediate their functions through interactions with proteins, RNA, DNA, or a combination of these. Their functions can often be dictated by their localization, sequence, and/or secondary structure. Here we provide a review of the approaches typically adopted to study the complexity of these genes with an emphasis on recent discoveries within the innate immune field. Finally, we discuss the challenges, as well as the emergence of new technologies that will continue to move this field forward and provide greater insight into the biological importance of this class of genes. This article is part of a Special Issue entitled: ncRNA in control of gene expression edited by Kotb Abdelmohsen.
Collapse
Affiliation(s)
- Elektra K Robinson
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, United States of America
| | - Sergio Covarrubias
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, United States of America
| | - Susan Carpenter
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, United States of America.
| |
Collapse
|
26
|
Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells. Nat Commun 2017; 8:16027. [PMID: 28722025 PMCID: PMC5524981 DOI: 10.1038/ncomms16027] [Citation(s) in RCA: 256] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 05/23/2017] [Indexed: 12/20/2022] Open
Abstract
Understanding gene regulation and function requires a genome-wide method capable of capturing both gene expression levels and isoform diversity at the single-cell level. Short-read RNAseq is limited in its ability to resolve complex isoforms because it fails to sequence full-length cDNA copies of RNA molecules. Here, we investigate whether RNAseq using the long-read single-molecule Oxford Nanopore MinION sequencer is able to identify and quantify complex isoforms without sacrificing accurate gene expression quantification. After benchmarking our approach, we analyse individual murine B1a cells using a custom multiplexing strategy. We identify thousands of unannotated transcription start and end sites, as well as hundreds of alternative splicing events in these B1a cells. We also identify hundreds of genes expressed across B1a cells that display multiple complex isoforms, including several B cell-specific surface receptors. Our results show that we can identify and quantify complex isoforms at the single cell level. Short-read RNA-seq is limited in its ability to resolve complex transcript isoforms since it cannot sequence full-length cDNA. Here the authors use Oxford Nanopore MinION and their Mandalorion analysis pipeline to measure complex isoforms in B1a cells.
Collapse
|
27
|
Abstract
Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature 'MinION' device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.
Collapse
Affiliation(s)
- Jean-Michel Carter
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Shobbir Hussain
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
28
|
Abstract
Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements. Methods: We made use of such components on ONTs miniature 'MinION' device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings. Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads. Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications.
Collapse
Affiliation(s)
- Jean-Michel Carter
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Shobbir Hussain
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
29
|
Abstract
Background: The ability to obtain long read lengths during DNA sequencing has several potentially important practical applications. Especially long read lengths have been reported using the Nanopore sequencing method, currently commercially available from Oxford Nanopore Technologies (ONT). However, early reports have demonstrated only limited levels of combined throughput and sequence accuracy. Recently, ONT released a new CsgG pore sequencing system as well as a 250b/s translocation chemistry with potential for improvements.
Methods: We made use of such components on ONTs miniature ‘MinION’ device and sequenced native genomic DNA obtained from the near haploid cancer cell line HAP1. Analysis of our data was performed utilising recently described computational tools tailored for nanopore/long-read sequencing outputs, and here we present our key findings.
Results: From a single sequencing run, we obtained ~240,000 high-quality mapped reads, comprising a total of ~2.3 billion bases. A mean read length of 9.6kb and an N50 of ~17kb was achieved, while sequences mapped to reference with a mean identity of 85%. Notably, we obtained ~68X coverage of the mitochondrial genome and were able to achieve a mean consensus identity of 99.8% for sequenced mtDNA reads.
Conclusions: With improved sequencing chemistries already released and higher-throughput instruments in the pipeline, this early study suggests that ONT CsgG-based sequencing may be a useful option for potential practical long-read applications with relevance to complex genomes.
Collapse
Affiliation(s)
- Jean-Michel Carter
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Shobbir Hussain
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|