1
|
Liu Q, Yang S, Tan Y, Cui L. High-throughput sequencing technology facilitates the discovery of novel biomarkers for antiphospholipid syndrome. Front Immunol 2023; 14:1128245. [PMID: 37275905 PMCID: PMC10235516 DOI: 10.3389/fimmu.2023.1128245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 05/09/2023] [Indexed: 06/07/2023] Open
Abstract
Antiphospholipid syndrome (APS) is characterized by arterial and venous thrombosis and/or morbid pregnancy, accompanied by persistent antiphospholipid antibody (aPL) positivity. However, due to the complex pathogenesis of APS and the large individual differences in the expression of aPL profiles of patients, the problem of APS diagnosis, prognosis judgment, and risk assessment may not be solved only from the antibody level. It is necessary to use new technologies and multiple dimensions to explore novel APS biomarkers. The application of next-generation sequencing (NGS) technology in diseases with a high incidence of somatic mutations, such as genetic diseases and tumors, has been very mature. Thus, we try to know the research and application progress of APS by NGS technology from genome, transcriptome, epigenome and other aspects. This review will describe the related research of NGS technology in APS and provide more reference for the deep understanding of APS-related screening markers and disease pathogenesis.
Collapse
Affiliation(s)
- Qi Liu
- Department of Clinical Laboratory, Peking University Third Hospital, Beijing, China
- Core Unit of National Clinical Research Center for Laboratory Medicine, Peking University Third Hospital, Beijing, China
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China
| | - Shuo Yang
- Department of Clinical Laboratory, Peking University Third Hospital, Beijing, China
- Core Unit of National Clinical Research Center for Laboratory Medicine, Peking University Third Hospital, Beijing, China
| | - Yuan Tan
- Department of Clinical Laboratory, Peking University Third Hospital, Beijing, China
- Core Unit of National Clinical Research Center for Laboratory Medicine, Peking University Third Hospital, Beijing, China
- Institute of Medical Technology, Peking University Health Science Center, Beijing, China
| | - Liyan Cui
- Department of Clinical Laboratory, Peking University Third Hospital, Beijing, China
- Core Unit of National Clinical Research Center for Laboratory Medicine, Peking University Third Hospital, Beijing, China
| |
Collapse
|
2
|
Salaria N, Neeraj, Furhan J, Kumar R. Gut Microbiome: Perspectives and Challenges in Human Health. ROLE OF MICROBES IN SUSTAINABLE DEVELOPMENT 2023:65-87. [DOI: 10.1007/978-981-99-3126-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
3
|
Khan QH. Identification of Conserved and Novel MicroRNAs with their Targets in Garden Pea ( Pisum Sativum L.) Leaves by High-Throughput Sequencing. Bioinform Biol Insights 2023; 17:11779322231162777. [PMID: 37020501 PMCID: PMC10068972 DOI: 10.1177/11779322231162777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/18/2023] [Indexed: 04/03/2023] Open
Abstract
MicroRNAs (miRNAs) are single-stranded, endogenous, non-coding RNAs of 20–24 nucleotides that play a significant role in post-transcriptional gene regulation. Various conserved and novel miRNAs have been characterized, especially from the plant species whose genomes were well-characterized; however, information on miRNA in economically important plants such as pea ( Pisum sativum L.) is limited. In this study, I have identified conserved and novel miRNA in garden pea plant leaves samples along with their targets by analyzing the next generation sequencing (NGS) data. The raw data obtained from NGS were processed and 1.38 million high-quality non-redundant reads were retained for analysis, this tremendous quantity of reads indicates a large and diverse small RNA population in pea leaves. After analyzing the deep sequencing data, 255 conserved and 11 novel miRNAs were identified in the garden pea leaves sample. Utilizing psRNATarget tool, the miRNA targets of conserved and novel miRNA were predicted. Further, the functional annotation of the miRNA targets were performed using blast2Go software and the target gene products were predicted. The miRNA target gene products along with GO_ID (Gene Ontology Identifier) were categorized into biological processes, cellular components, and molecular functions. The information obtained from this study will provide genomic resources that will help in understanding miRNA-mediated post-transcriptional gene regulation in garden peas.
Collapse
Affiliation(s)
- Qurshid Hasan Khan
- Qurshid Hasan Khan, Department of Plant
Sciences, University of Hyderabad, Gachibowli, Hyderabad 500046, Telangana,
India.
| |
Collapse
|
4
|
Gupta AK, Kumar M. Benchmarking and Assessment of Eight De Novo Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:372-381. [PMID: 35759429 DOI: 10.1089/omi.2022.0042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight de novo assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative De Bruijn graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight de novo genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.
Collapse
Affiliation(s)
- Amit Kumar Gupta
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
5
|
Gardner PP, Paterson JM, McGimpsey S, Ashari-Ghomi F, Umu SU, Pawlik A, Gavryushkin A, Black MA. Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software. Genome Biol 2022; 23:56. [PMID: 35172880 PMCID: PMC8851831 DOI: 10.1186/s13059-022-02625-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 02/06/2022] [Indexed: 11/29/2022] Open
Abstract
Background Computational biology provides software tools for testing and making inferences about biological data. In the face of increasing volumes of data, heuristic methods that trade software speed for accuracy may be employed. We have studied these trade-offs using the results of a large number of independent software benchmarks, and evaluated whether external factors, including speed, author reputation, journal impact, recency and developer efforts, are indicative of accurate software. Results We find that software speed, author reputation, journal impact, number of citations and age are unreliable predictors of software accuracy. This is unfortunate because these are frequently cited reasons for selecting software tools. However, GitHub-derived statistics and high version numbers show that accurate bioinformatic software tools are generally the product of many improvements over time. We also find an excess of slow and inaccurate bioinformatic software tools, and this is consistent across many sub-disciplines. There are few tools that are middle-of-road in terms of accuracy and speed trade-offs. Conclusions Our findings indicate that accurate bioinformatic software is primarily the product of long-term commitments to software development. In addition, we hypothesise that bioinformatics software suffers from publication bias. Software that is intermediate in terms of both speed and accuracy may be difficult to publish—possibly due to author, editor and reviewer practises. This leaves an unfortunate hole in the literature, as ideal tools may fall into this gap. High accuracy tools are not always useful if they are slow, while high speed is not useful if the results are also inaccurate. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-022-02625-x).
Collapse
Affiliation(s)
- Paul P Gardner
- Department of Biochemistry,, University of Otago, Dunedin, New Zealand. .,Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand.
| | - James M Paterson
- Department of Civil and Natural Resources Engineering, University of Canterbury, Christchurch, New Zealand
| | | | - Fatemeh Ashari-Ghomi
- Research Group for Genomic Epidemiology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Sinan U Umu
- Department of Research, Cancer Registry of Norway, Oslo, Norway
| | | | - Alex Gavryushkin
- Department of Computer Science, University of Otago, Dunedin, New Zealand.,School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Michael A Black
- Department of Biochemistry,, University of Otago, Dunedin, New Zealand
| |
Collapse
|
6
|
Oliva A, Tobler R, Cooper A, Llamas B, Souilmi Y. Systematic benchmark of ancient DNA read mapping. Brief Bioinform 2021; 22:6217726. [PMID: 33834210 DOI: 10.1093/bib/bbab076] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 01/05/2021] [Accepted: 02/16/2021] [Indexed: 11/12/2022] Open
Abstract
The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA 'reads') against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30-80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software-BWA-aln, BWA-mem, NovoAlign and Bowtie2-and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.
Collapse
Affiliation(s)
- Adrien Oliva
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| | - Raymond Tobler
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| | - Alan Cooper
- Australian Research Council Laureate Fellow specializing in ancient DNA, Australia
| | - Bastien Llamas
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| | - Yassine Souilmi
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| |
Collapse
|
7
|
Nodehi HM, Tabatabaiefar MA, Sehhati M. Selection of Optimal Bioinformatic Tools and Proper Reference for Reducing the Alignment Error in Targeted Sequencing Data. JOURNAL OF MEDICAL SIGNALS & SENSORS 2021; 11:37-44. [PMID: 34026589 PMCID: PMC8043119 DOI: 10.4103/jmss.jmss_7_20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 01/28/2020] [Accepted: 02/12/2020] [Indexed: 11/04/2022]
Abstract
Background Careful design in the primary steps of a next-generation sequencing study is critical for obtaining successful results in downstream analysis. Methods In this study, a framework is proposed to evaluate and improve the sequence mapping in targeted regions of the reference genome. In this regard, simulated short reads were produced from the coding regions of the human genome and mapped to a Customized Target-Based Reference (CTBR) by the alignment tools that have been introduced recently. The short reads produced by different sequencing technologies aligned to the standard genome and also CTBR with and without well-defined mutation types where the amount of unmapped and misaligned reads and runtime was measured for comparison. Results The results showed that the mapping accuracy of the reads generated from Illumina Hiseq2500 using Stampy as the alignment tool whenever the CTBR was used as reference was significantly better than other evaluated pipelines. Using CTBR for alignment significantly decreased the mapping error in comparison to other expanded or more limited references. While intentional mutations were imported in the reads, Stampy showed the minimum error of 1.67% using CTBR. However, the lowest error obtained by stampy too using whole genome and one chromosome as references was 3.78% and 20%, respectively. Maximum and minimum misalignment errors were observed on chromosome Y and 20, respectively. Conclusion Therefore using the proposed framework in a clinical targeted sequencing study may lead to predict the error and improve the performance of variant calling regarding the genomic regions targeted in a clinical study.
Collapse
Affiliation(s)
- Hannane Mohammadi Nodehi
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Amin Tabatabaiefar
- Department of Medical Genetics, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.,Department of Bioinformatics, Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammadreza Sehhati
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
8
|
Variant Calling Using Whole Genome Resequencing and Sequence Capture for Population and Evolutionary Genomic Inferences in Norway Spruce (Picea Abies). COMPENDIUM OF PLANT GENOMES 2020. [DOI: 10.1007/978-3-030-21001-4_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
9
|
Miura N, Ishihara Y, Miura Y, Kimoto M, Miura K. miR-520d-5p can reduce the mutations in hepatoma cancer cells and iPSCs-derivatives. BMC Cancer 2019; 19:587. [PMID: 31202279 PMCID: PMC6570841 DOI: 10.1186/s12885-019-5786-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 05/31/2019] [Indexed: 01/13/2023] Open
Abstract
Background Human microRNAs (miRNAs) have diverse functions in biology, and play a role in nearly every biological process. Here we report that miR-520d-5p (520d-5p) causes undifferentiated cancer cells to adopt benign or normal status in vivo in immunodeficient mice via demethylation and P53 upregulation. Further we found that 520-5p causes normal cells to elongate cellular lifetime and mesenchymal stem cell-like status with CD105 positivity. We hypothesized that ectopic 520d-5p expression reduced mutations in undifferentiated type of hepatoma (HLF) cells through synergistic modulation of methylation-related enzymatic expression. Methods To examine whether there were any changes in mutation status in cells treated with 520d-5p, we performed next generation sequencing (NGS) in HLF cells and human iPSC-derivative cells in pre-mesenchymal stem cell status. We analyzed the data using both genome-wide and individual gene function approaches. Results 520d-5p induced a shift towards a wild type or non-malignant phenotype, which was regulated by nucleotide mutations in both HLF cells and iPSCs. Further, 520d-5p reduced mutation levels in both the whole genome and genomic fragment assemblies. Conclusions Cancer cell genomic mutations cannot be repaired in most contexts. However, these findings suggest that applied development of 520d-5p would allow new approaches to cancer research and improve the quality of iPSCs used in regenerative medicine. Electronic supplementary material The online version of this article (10.1186/s12885-019-5786-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Norimasa Miura
- PEZY-Pharma, Inc., 2-13-14 Hatagasaki, Yonago, Tottori, 683-8503, Japan.,i-Medical Clinic, 3-4-18 Mejiro, Toshima-ku, Tokyo, 171-0031, Japan
| | - Yoshitaka Ishihara
- Division Pharmacotherapeutics, Faculty of Medicine, Tottori University, 86 Nishicho, Yonago, Tottori, 683-8503, Japan
| | - Yugo Miura
- Department of Orthopaedic Surgery, Soka Municipal Hospital, 2-21-1 Soka, Soka, Saitama, 340-8560, Japan
| | - Mai Kimoto
- Hokkaido System Science Co., Ltd., 2-1, Shinkawa Nishi 2-1, Kitaku, Sapporo, 001-0932, Japan
| | - Keigo Miura
- PEZY-Pharma, Inc., 2-13-14 Hatagasaki, Yonago, Tottori, 683-8503, Japan.
| |
Collapse
|
10
|
Chandak S, Tatwawadi K, Weissman T. Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis. Bioinformatics 2018; 34:558-567. [PMID: 29444237 DOI: 10.1093/bioinformatics/btx639] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 10/06/2017] [Indexed: 12/30/2022] Open
Abstract
Motivation New Generation Sequencing (NGS) technologies for genome sequencing produce large amounts of short genomic reads per experiment, which are highly redundant and compressible. However, general-purpose compressors are unable to exploit this redundancy due to the special structure present in the data. Results We present a new algorithm for compressing reads both with and without preserving the read order. In both cases, it achieves 1.4×-2× compression gain over state-of-the-art read compression tools for datasets containing as many as 3 billion Illumina reads. Our tool is based on the idea of approximately reordering the reads according to their position in the genome using hashed substring indices. We also present a systematic analysis of the read compression problem and compute bounds on fundamental limits of read compression. This analysis sheds light on the dynamics of the proposed algorithm (and read compression algorithms in general) and helps understand its performance in practice. The algorithm compresses only the read sequence, works with unaligned FASTQ files, and does not require a reference. Contact schandak@stanford.edu. Supplementary information Supplementary material are available at Bioinformatics online. The proposed algorithm is available for download at https://github.com/shubhamchandak94/HARC.
Collapse
Affiliation(s)
- Shubham Chandak
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Kedar Tatwawadi
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Wee Y, Bhyan SB, Liu Y, Lu J, Li X, Zhao M. The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing. Brief Funct Genomics 2018; 18:1-12. [DOI: 10.1093/bfgp/ely037] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 10/03/2018] [Accepted: 10/19/2018] [Indexed: 02/06/2023] Open
Affiliation(s)
- YongKiat Wee
- School of Science and Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
| | - Salma Begum Bhyan
- School of Science and Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
| | - Yining Liu
- The School of Public Health, Institute for Chemical Carcinogenesis,Guangzhou Medical University, Dongfengxi Road, Guangzhou, China
| | - Jiachun Lu
- The School of Public Health, Institute for Chemical Carcinogenesis,Guangzhou Medical University, Dongfengxi Road, Guangzhou, China
- The School of Public Health, The First Affiliated Hospital, Guangzhou Medical University, Guangzhou, China
| | - Xiaoyan Li
- Beijing Anzhen Hospital, Capital Medical University, Beijing Institute of Heart, Lung & Blood Vessel Disease, Beijing, China
| | - Min Zhao
- School of Science and Engineering, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Queensland, Australia
| |
Collapse
|
12
|
Lee H, Lee KW, Lee T, Park D, Chung J, Lee C, Park WY, Son DS. Performance evaluation method for read mapping tool in clinical panel sequencing. Genes Genomics 2017; 40:189-197. [PMID: 29568413 PMCID: PMC5846869 DOI: 10.1007/s13258-017-0621-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 10/11/2017] [Indexed: 01/28/2023]
Abstract
In addition to the rapid advancement in Next-Generation Sequencing (NGS) technology, clinical panel sequencing is being used increasingly in clinical studies and tests. However, tools that are used in NGS data analysis have not been comparatively evaluated in performance for panel sequencing. This study aimed to evaluate the tools used in the alignment process, the first procedure in bioinformatics analysis, by comparing tools that have been widely used with ones that have been introduced recently. With the accumulated panel sequencing data, detected variant lists were cataloged and inserted into simulated reads produced from the reference genome (h19). The amount of unmapped reads and misaligned reads, mapping quality distribution, and runtime were measured as standards for comparison. As the most widely used tools, Bowtie2 and BWA–MEM each showed explicit performance with AUC of 0.9984 and 0.9970 respectively. Kart, maintaining superior runtime and less number of misaligned read, also similarly possessed high level of AUC (0.9723). Such selection and optimization method of tools appropriate for panel sequencing can be utilized for fields requiring error minimization, such as clinical application and liquid biopsy studies.
Collapse
Affiliation(s)
- Hojun Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| | - Ki-Wook Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,2Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea
| | - Taeseob Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| | - Donghyun Park
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| | - Jongsuk Chung
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,3Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon, 16419 South Korea
| | - Chung Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,4Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea
| | - Woong-Yang Park
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,3Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon, 16419 South Korea.,4Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea
| | - Dae-Soon Son
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| |
Collapse
|
13
|
Abstract
Methylation of the 5-cytosine (m5C) is a common but not well-understood RNA modification, which can be detected by sequencing of bisulfite-treated transcripts (RNA-BSseq). In this Chapter, we discuss computational RNA-BSseq data analysis methods for transcriptome-wide identification and quantification of m5C.
Collapse
Affiliation(s)
- Dietmar Rieder
- Division of Bioinformatics, Biocenter, Medical University of Innsbruck, Innrain 80/IV, Innsbruck, 6020, Austria.
| | - Francesca Finotello
- Division of Bioinformatics, Biocenter, Medical University of Innsbruck, Innrain 80/IV, Innsbruck, 6020, Austria
| |
Collapse
|
14
|
Hahn MM, de Voer RM, Hoogerbrugge N, Ligtenberg MJL, Kuiper RP, van Kessel AG. The genetic heterogeneity of colorectal cancer predisposition - guidelines for gene discovery. Cell Oncol (Dordr) 2016; 39:491-510. [PMID: 27279102 PMCID: PMC5121185 DOI: 10.1007/s13402-016-0284-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2016] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is a cumulative term applied to a clinically and genetically heterogeneous group of neoplasms that occur in the bowel. Based on twin studies, up to 45 % of the CRC cases may involve a heritable component. Yet, only in 5-10 % of these cases high-penetrant germline mutations are found (e.g. mutations in APC and DNA mismatch repair genes) that result in a familial aggregation and/or an early onset of the disease. Genome-wide association studies have revealed that another ~5 % of the CRC cases may be explained by a cumulative effect of low-penetrant risk factors. Recent attempts to identify novel genetic factors using whole exome and whole genome sequencing has proven to be difficult since the remaining, yet to be discovered, high penetrant CRC predisposing genes appear to be rare. In addition, most of the moderately penetrant candidate genes identified so far have not been confirmed in independent cohorts. Based on literature examples, we here discuss how careful patient and cohort selection, candidate gene and variant selection, and corroborative evidence may be employed to facilitate the discovery of novel CRC predisposing genes. CONCLUSIONS The picture emerges that the genetic predisposition to CRC is heterogeneous, involving complex interplays between common and rare (inter)genic variants with different penetrances. It is anticipated, however, that the use of large clinically well-defined patient and control datasets, together with improved functional and technical possibilities, will yield enough power to unravel this complex interplay and to generate accurate individualized estimates for the risk to develop CRC.
Collapse
Affiliation(s)
- M M Hahn
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - R M de Voer
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - N Hoogerbrugge
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - M J L Ligtenberg
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - R P Kuiper
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands.
| | - A Geurts van Kessel
- Department of Human Genetics, Radboud Institute of Molecular Life Sciences, Radboud University Medical Center, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| |
Collapse
|
15
|
Vasli N, Harris E, Karamchandani J, Bareke E, Majewski J, Romero NB, Stojkovic T, Barresi R, Tasfaout H, Charlton R, Malfatti E, Bohm J, Marini-Bettolo C, Choquet K, Dicaire MJ, Shao YH, Topf A, O'Ferrall E, Eymard B, Straub V, Blanco G, Lochmüller H, Brais B, Laporte J, Tétreault M. Recessive mutations in the kinase ZAK cause a congenital myopathy with fibre type disproportion. Brain 2016; 140:37-48. [PMID: 27816943 DOI: 10.1093/brain/aww257] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/03/2016] [Accepted: 08/31/2016] [Indexed: 01/09/2023] Open
Abstract
Congenital myopathies define a heterogeneous group of neuromuscular diseases with neonatal or childhood hypotonia and muscle weakness. The genetic cause is still unknown in many patients, precluding genetic counselling and better understanding of the physiopathology. To identify novel genetic causes of congenital myopathies, exome sequencing was performed in three consanguineous families. We identified two homozygous frameshift mutations and a homozygous nonsense mutation in the mitogen-activated protein triple kinase ZAK. In total, six affected patients carry these mutations. Reverse transcription polymerase chain reaction and transcriptome analyses suggested nonsense mRNA decay as a main impact of mutations. The patients demonstrated a generalized slowly progressive muscle weakness accompanied by decreased vital capacities. A combination of proximal contractures with distal joint hyperlaxity is a distinct feature in one family. The low endurance and compound muscle action potential amplitude were strongly ameliorated on treatment with anticholinesterase inhibitor in another patient. Common histopathological features encompassed fibre size variation, predominance of type 1 fibre and centralized nuclei. A peculiar subsarcolemmal accumulation of mitochondria pointing towards the centre of the fibre was a novel histological hallmark in one family. These findings will improve the molecular diagnosis of congenital myopathies and implicate the mitogen-activated protein kinase (MAPK) signalling as a novel pathway altered in these rare myopathies.
Collapse
Affiliation(s)
- Nasim Vasli
- 1 Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), 1, rue Laurent Fries, BP 10142, 67404 Illkirch, France.,2 INSERM U974, 67404 Illkirch, France.,3 CNRS, UMR7104, 67404 Illkirch, France.,4 Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, 67404 Illkirch, France
| | - Elizabeth Harris
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Jason Karamchandani
- 6 Department of Pathology, McGill University Health Centre, Montreal Neurological Institute Hospital, Montreal, QC H3A 2B4, Canada
| | - Eric Bareke
- 7 Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada.,8 McGill University and Genome Quebec Innovation Center, Montreal, QC H3A 1A4, Canada
| | - Jacek Majewski
- 7 Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada.,8 McGill University and Genome Quebec Innovation Center, Montreal, QC H3A 1A4, Canada
| | - Norma B Romero
- 9 Université Sorbonne, UPMC Univ Paris 06, INSERM UMRS974, CNRS FRE3617, Center for Research in Myology, GH Pitié-Salpêtrière, 47 Boulevard de l'hôpital, 75013 Paris, France.,10 Centre de référence de Pathologie Neuromusculaire Paris-Est, Institut de Myologie, GHU Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Tanya Stojkovic
- 10 Centre de référence de Pathologie Neuromusculaire Paris-Est, Institut de Myologie, GHU Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Rita Barresi
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Hichem Tasfaout
- 1 Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), 1, rue Laurent Fries, BP 10142, 67404 Illkirch, France.,2 INSERM U974, 67404 Illkirch, France.,3 CNRS, UMR7104, 67404 Illkirch, France.,4 Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, 67404 Illkirch, France
| | - Richard Charlton
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Edoardo Malfatti
- 9 Université Sorbonne, UPMC Univ Paris 06, INSERM UMRS974, CNRS FRE3617, Center for Research in Myology, GH Pitié-Salpêtrière, 47 Boulevard de l'hôpital, 75013 Paris, France.,10 Centre de référence de Pathologie Neuromusculaire Paris-Est, Institut de Myologie, GHU Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Johann Bohm
- 1 Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), 1, rue Laurent Fries, BP 10142, 67404 Illkirch, France.,2 INSERM U974, 67404 Illkirch, France.,3 CNRS, UMR7104, 67404 Illkirch, France.,4 Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, 67404 Illkirch, France
| | - Chiara Marini-Bettolo
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Karine Choquet
- 7 Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada.,11 Rare Neurological Diseases Group, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | - Marie-Josée Dicaire
- 11 Rare Neurological Diseases Group, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | - Yi-Hong Shao
- 11 Rare Neurological Diseases Group, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | - Ana Topf
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Erin O'Ferrall
- 6 Department of Pathology, McGill University Health Centre, Montreal Neurological Institute Hospital, Montreal, QC H3A 2B4, Canada
| | - Bruno Eymard
- 10 Centre de référence de Pathologie Neuromusculaire Paris-Est, Institut de Myologie, GHU Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Volker Straub
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Gonzalo Blanco
- 12 Department of Biology, University of York, Wentworth Way, York YO10 5DD, UK
| | - Hanns Lochmüller
- 5 John Walton Muscular Dystrophy Research Centre, MRC Centre for Neuromuscular Diseases, Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, NE1 3BZ, UK
| | - Bernard Brais
- 7 Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada.,11 Rare Neurological Diseases Group, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | - Jocelyn Laporte
- 1 Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), 1, rue Laurent Fries, BP 10142, 67404 Illkirch, France .,2 INSERM U974, 67404 Illkirch, France.,3 CNRS, UMR7104, 67404 Illkirch, France.,4 Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, 67404 Illkirch, France
| | - Martine Tétreault
- 7 Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada .,8 McGill University and Genome Quebec Innovation Center, Montreal, QC H3A 1A4, Canada
| |
Collapse
|
16
|
Posada-Cespedes S, Seifert D, Beerenwinkel N. Recent advances in inferring viral diversity from high-throughput sequencing data. Virus Res 2016; 239:17-32. [PMID: 27693290 DOI: 10.1016/j.virusres.2016.09.016] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 09/23/2016] [Accepted: 09/24/2016] [Indexed: 02/05/2023]
Abstract
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.
Collapse
Affiliation(s)
- Susana Posada-Cespedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland.
| |
Collapse
|
17
|
van der Weide RH, Simonis M, Hermsen R, Toonen P, Cuppen E, de Ligt J. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats. PLoS One 2016; 11:e0160036. [PMID: 27501045 PMCID: PMC4976967 DOI: 10.1371/journal.pone.0160036] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 07/12/2016] [Indexed: 01/17/2023] Open
Abstract
Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts.
Collapse
Affiliation(s)
- Robin H. van der Weide
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
- Division of Gene Regulation, The Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Marieke Simonis
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Roel Hermsen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Pim Toonen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Edwin Cuppen
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| | - Joep de Ligt
- Hubrecht Institute, Royal Netherlands Academy of Arts and Sciences (KNAW), University Medical Centre Utrecht, Utrecht, The Netherlands
| |
Collapse
|
18
|
Buschmann D, Haberberger A, Kirchner B, Spornraft M, Riedmaier I, Schelling G, Pfaffl MW. Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow. Nucleic Acids Res 2016; 44:5995-6018. [PMID: 27317696 PMCID: PMC5291277 DOI: 10.1093/nar/gkw545] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/03/2016] [Indexed: 12/21/2022] Open
Abstract
Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions.
Collapse
Affiliation(s)
- Dominik Buschmann
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany Institute of Human Genetics, University Hospital, Ludwig-Maximilians-University Munich, Goethestraße 29, 80336 München, Germany
| | - Anna Haberberger
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Benedikt Kirchner
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Melanie Spornraft
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Irmgard Riedmaier
- Eurofins Medigenomix Forensik GmbH, Anzinger Straße 7a, 85560 Ebersberg, Germany Department of Anesthesiology, University Hospital, Ludwig-Maximilians-University Munich, Marchioninistraße 15, 81377 München, Germany
| | - Gustav Schelling
- Department of Physiology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| | - Michael W Pfaffl
- Department of Animal Physiology and Immunology, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Weihenstephaner Berg 3, 85354 Freising, Germany
| |
Collapse
|
19
|
Dan Xue, Chen H, Chen F, He Y, Zhao C, Zhu D, Zeng L, Li W. Analysis of the rumen bacteria and methanogenic archaea of yak (Bos grunniens) steers grazing on the Qinghai-Tibetan Plateau. Livest Sci 2016. [DOI: 10.1016/j.livsci.2016.04.009] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Kowar T, Zakrzewski F, Macas J, Kobližková A, Viehoever P, Weisshaar B, Schmidt T. Repeat Composition of CenH3-chromatin and H3K9me2-marked heterochromatin in Sugar Beet (Beta vulgaris). BMC PLANT BIOLOGY 2016; 16:120. [PMID: 27230558 PMCID: PMC4881148 DOI: 10.1186/s12870-016-0805-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 05/17/2016] [Indexed: 05/18/2023]
Abstract
BACKGROUND Sugar beet (Beta vulgaris) is an important crop of temperate climate zones, which provides nearly 30 % of the world's annual sugar needs. From the total genome size of 758 Mb, only 567 Mb were incorporated in the recently published genome sequence, due to the fact that regions with high repetitive DNA contents (e.g. satellite DNAs) are only partially included. Therefore, to fill these gaps and to gain information about the repeat composition of centromeres and heterochromatic regions, we performed chromatin immunoprecipitation followed by sequencing (ChIP-Seq) using antibodies against the centromere-specific histone H3 variant of sugar beet (CenH3) and the heterochromatic mark of dimethylated lysine 9 of histone H3 (H3K9me2). RESULTS ChIP-Seq analysis revealed that active centromeres containing CenH3 consist of the satellite pBV and the Ty3-gypsy retrotransposon Beetle7, while heterochromatin marked by H3K9me2 exhibits heterogeneity in repeat composition. H3K9me2 was mainly associated with the satellite family pEV, the Ty1-copia retrotransposon family Cotzilla and the DNA transposon superfamily of the En/Spm type. In members of the section Beta within the genus Beta, immunostaining using the CenH3 antibody was successful, indicating that orthologous CenH3 proteins are present in closely related species within this section. CONCLUSIONS The identification of repetitive genome portions by ChIP-Seq experiments complemented the sugar beet reference sequence by providing insights into the repeat composition of poorly characterized CenH3-chromatin and H3K9me2-heterochromatin. Therefore, our work provides the basis for future research and application concerning the sugar beet centromere and repeat-rich heterochromatic regions characterized by the presence of H3K9me2.
Collapse
Affiliation(s)
- Teresa Kowar
- Department of Plant Cell and Molecular Biology, TU Dresden, Dresden, D-01062, Germany
| | - Falk Zakrzewski
- Department of Plant Cell and Molecular Biology, TU Dresden, Dresden, D-01062, Germany
| | - Jiří Macas
- Biology Centre ASCR, Institute of Plant Molecular Biology, Branišovská 31, Česke Budějovice, CZ-37005, Czech Republic
| | - Andrea Kobližková
- Biology Centre ASCR, Institute of Plant Molecular Biology, Branišovská 31, Česke Budějovice, CZ-37005, Czech Republic
| | - Prisca Viehoever
- CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstr. 25, Bielefeld, D-33615, Germany
| | - Bernd Weisshaar
- CeBiTec & Faculty of Biology, Bielefeld University, Universitätsstr. 25, Bielefeld, D-33615, Germany.
| | - Thomas Schmidt
- Department of Plant Cell and Molecular Biology, TU Dresden, Dresden, D-01062, Germany
| |
Collapse
|
21
|
|
22
|
Li J, Batcha AMN, Grüning B, Mansmann UR. An NGS Workflow Blueprint for DNA Sequencing Data and Its Application in Individualized Molecular Oncology. Cancer Inform 2016; 14:87-107. [PMID: 27081306 PMCID: PMC4827795 DOI: 10.4137/cin.s30793] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 03/02/2016] [Accepted: 03/17/2016] [Indexed: 12/23/2022] Open
Abstract
Next-generation sequencing (NGS) technologies that have advanced rapidly in the past few years possess the potential to classify diseases, decipher the molecular code of related cell processes, identify targets for decision-making on targeted therapy or prevention strategies, and predict clinical treatment response. Thus, NGS is on its way to revolutionize oncology. With the help of NGS, we can draw a finer map for the genetic basis of diseases and can improve our understanding of diagnostic and prognostic applications and therapeutic methods. Despite these advantages and its potential, NGS is facing several critical challenges, including reduction of sequencing cost, enhancement of sequencing quality, improvement of technical simplicity and reliability, and development of semiautomated and integrated analysis workflow. In order to address these challenges, we conducted a literature research and summarized a four-stage NGS workflow for providing a systematic review on NGS-based analysis, explaining the strength and weakness of diverse NGS-based software tools, and elucidating its potential connection to individualized medicine. By presenting this four-stage NGS workflow, we try to provide a minimal structural layout required for NGS data storage and reproducibility.
Collapse
Affiliation(s)
- Jian Li
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany.; German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Aarif Mohamed Nazeer Batcha
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany.; German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany.; Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Ulrich R Mansmann
- Institute for Medical Informatics, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany.; German Cancer Consortium (DKTK), Heidelberg, Germany
| |
Collapse
|
23
|
Zukurov JP, do Nascimento-Brito S, Volpini AC, Oliveira GC, Janini LMR, Antoneli F. Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage. Algorithms Mol Biol 2016; 11:2. [PMID: 26973707 PMCID: PMC4788855 DOI: 10.1186/s13015-016-0064-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 02/25/2016] [Indexed: 12/16/2022] Open
Abstract
Background In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some next-generation sequencing platforms in order to estimate a population of haplotypes which represent the diversity of the original population. The method proposed here is custom-made to take advantage of the very low error rate and extremely deep coverage per site, which are the main features of some neglected technologies that have not received much attention due to the short length of its reads, which precludes haplotype estimation. This approach allowed us to avoid some hard problems related to haplotype reconstruction (need of long reads, preliminary error filtering and assembly). Results We propose to measure genetic diversity of a viral population through a family of multinomial probability distributions indexed by the sites of the virus genome, each one representing the distribution of nucleic bases per site. Moreover, the implementation of the method focuses on two main optimization strategies: a read mapping/alignment procedure that aims at the recovery of the maximum possible number of short-reads; the inference of the multinomial parameters in a Bayesian framework with smoothed Dirichlet estimation. The Bayesian approach provides conditional probability distributions for the multinomial parameters allowing one to take into account the prior information of the control experiment and providing a natural way to separate signal from noise, since it automatically furnishes Bayesian confidence intervals and thus avoids the drawbacks of preliminary error filtering. Conclusions The methods described in this paper have been implemented as an integrated tool called Tanden (Tool for Analysis of Diversity in Viral Populations) and successfully tested on samples obtained from HIV-1 strain NL4-3 (group M, subtype B) cultivations on primary human cell cultures in many distinct viral propagation conditions. Tanden is written in C# (Microsoft), runs on the Windows operating system, and can be downloaded from: http://tanden.url.ph/.
Collapse
|
24
|
Bar I, Cummins S, Elizur A. Transcriptome analysis reveals differentially expressed genes associated with germ cell and gonad development in the Southern bluefin tuna (Thunnus maccoyii). BMC Genomics 2016; 17:217. [PMID: 26965070 PMCID: PMC4785667 DOI: 10.1186/s12864-016-2397-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 01/14/2016] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Controlling and managing the breeding of bluefin tuna (Thunnus spp.) in captivity is an imperative step towards obtaining a sustainable supply of these fish in aquaculture production systems. Germ cell transplantation (GCT) is an innovative technology for the production of inter-species surrogates, by transplanting undifferentiated germ cells derived from a donor species into larvae of a host species. The transplanted surrogates will then grow and mature to produce donor-derived seed, thus providing a simpler alternative to maintaining large-bodied broodstock such as the bluefin tuna. Implementation of GCT for new species requires the development of molecular tools to follow the fate of the transplanted germ cells. These tools are based on key reproductive and germ cell-specific genes. RNA-Sequencing (RNA-Seq) provides a rapid, cost-effective method for high throughput gene identification in non-model species. This study utilized RNA-Seq to identify key genes expressed in the gonads of Southern bluefin tuna (Thunnus maccoyii, SBT) and their specific expression patterns in male and female gonad cells. RESULTS Key genes involved in the reproductive molecular pathway and specifically, germ cell development in gonads, were identified using analysis of RNA-Seq transcriptomes of male and female SBT gonad cells. Expression profiles of transcripts from ovary and testis cells were compared, as well as testis germ cell-enriched fraction prepared with Percoll gradient, as used in GCT studies. Ovary cells demonstrated over-expression of genes related to stem cell maintenance, while in testis cells, transcripts encoding for reproduction-associated receptors, sex steroids and hormone synthesis and signaling genes were over-expressed. Within the testis cells, the Percoll-enriched fraction showed over-expression of genes that are related to post-meiosis germ cell populations. CONCLUSIONS Gonad development and germ cell related genes were identified from SBT gonads and their expression patterns in ovary and testis cells were determined. These expression patterns correlate with the reproductive developmental stage of the sampled fish. The majority of the genes described in this study were sequenced for the first time in T. maccoyii. The wealth of SBT gonadal and germ cell-related gene sequences made publicly available by this study provides an extensive resource for further GCT and reproductive molecular biology studies of this commercially valuable fish.
Collapse
Affiliation(s)
- Ido Bar
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| | - Scott Cummins
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| | - Abigail Elizur
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| |
Collapse
|
25
|
Uribe-Convers S, Settles ML, Tank DC. A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae). PLoS One 2016; 11:e0148203. [PMID: 26828929 PMCID: PMC4734709 DOI: 10.1371/journal.pone.0148203] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 01/14/2016] [Indexed: 11/30/2022] Open
Abstract
Advances in high-throughput sequencing (HTS) have allowed researchers to obtain large amounts of biological sequence information at speeds and costs unimaginable only a decade ago. Phylogenetics, and the study of evolution in general, is quickly migrating towards using HTS to generate larger and more complex molecular datasets. In this paper, we present a method that utilizes microfluidic PCR and HTS to generate large amounts of sequence data suitable for phylogenetic analyses. The approach uses the Fluidigm Access Array System (Fluidigm, San Francisco, CA, USA) and two sets of PCR primers to simultaneously amplify 48 target regions across 48 samples, incorporating sample-specific barcodes and HTS adapters (2,304 unique amplicons per Access Array). The final product is a pooled set of amplicons ready to be sequenced, and thus, there is no need to construct separate, costly genomic libraries for each sample. Further, we present a bioinformatics pipeline to process the raw HTS reads to either generate consensus sequences (with or without ambiguities) for every locus in every sample or—more importantly—recover the separate alleles from heterozygous target regions in each sample. This is important because it adds allelic information that is well suited for coalescent-based phylogenetic analyses that are becoming very common in conservation and evolutionary biology. To test our approach and bioinformatics pipeline, we sequenced 576 samples across 96 target regions belonging to the South American clade of the genus Bartsia L. in the plant family Orobanchaceae. After sequencing cleanup and alignment, the experiment resulted in ~25,300bp across 486 samples for a set of 48 primer pairs targeting the plastome, and ~13,500bp for 363 samples for a set of primers targeting regions in the nuclear genome. Finally, we constructed a combined concatenated matrix from all 96 primer combinations, resulting in a combined aligned length of ~40,500bp for 349 samples.
Collapse
Affiliation(s)
- Simon Uribe-Convers
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho, United States of America
- Stillinger Herbarium, University of Idaho, Moscow, Idaho, United States of America
- * E-mail:
| | - Matthew L. Settles
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho, United States of America
| | - David C. Tank
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
- Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho, United States of America
- Stillinger Herbarium, University of Idaho, Moscow, Idaho, United States of America
| |
Collapse
|
26
|
Han Y, Gao S, Muegge K, Zhang W, Zhou B. Advanced Applications of RNA Sequencing and Challenges. Bioinform Biol Insights 2015; 9:29-46. [PMID: 26609224 PMCID: PMC4648566 DOI: 10.4137/bbi.s28991] [Citation(s) in RCA: 129] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 09/30/2015] [Accepted: 10/02/2015] [Indexed: 12/18/2022] Open
Abstract
Next-generation sequencing technologies have revolutionarily advanced sequence-based research with the advantages of high-throughput, high-sensitivity, and high-speed. RNA-seq is now being used widely for uncovering multiple facets of transcriptome to facilitate the biological applications. However, the large-scale data analyses associated with RNA-seq harbors challenges. In this study, we present a detailed overview of the applications of this technology and the challenges that need to be addressed, including data preprocessing, differential gene expression analysis, alternative splicing analysis, variants detection and allele-specific expression, pathway analysis, co-expression network analysis, and applications combining various experimental procedures beyond the achievements that have been made. Specifically, we discuss essential principles of computational methods that are required to meet the key challenges of the RNA-seq data analyses, development of various bioinformatics tools, challenges associated with the RNA-seq applications, and examples that represent the advances made so far in the characterization of the transcriptome.
Collapse
Affiliation(s)
- Yixing Han
- Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA
| | - Shouguo Gao
- Bioinformatics and Systems Biology Core, National Heart Lung Blood Institute, National Institutes of Health, Rockville Pike, Bethesda, MD, USA
| | - Kathrin Muegge
- Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA. ; Leidos Biomedical Research, Inc., Basic Science Program, Frederick National Laboratory, Frederick, MD, USA
| | - Wei Zhang
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Bing Zhou
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
27
|
Monat C, Tranchant-Dubreuil C, Kougbeadjo A, Farcy C, Ortega-Abboud E, Amanzougarene S, Ravel S, Agbessi M, Orjuela-Bouniol J, Summo M, Sabot F. TOGGLE: toolbox for generic NGS analyses. BMC Bioinformatics 2015; 16:374. [PMID: 26552596 PMCID: PMC4640241 DOI: 10.1186/s12859-015-0795-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 10/24/2015] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The explosion of NGS (Next Generation Sequencing) sequence data requires a huge effort in Bioinformatics methods and analyses. The creation of dedicated, robust and reliable pipelines able to handle dozens of samples from raw FASTQ data to relevant biological data is a time-consuming task in all projects relying on NGS. To address this, we created a generic and modular toolbox for developing such pipelines. RESULTS TOGGLE (TOolbox for Generic nGs anaLysEs) is a suite of tools able to design pipelines that manage large sets of NGS softwares and utilities. Moreover, TOGGLE offers an easy way to manipulate the various options of the different softwares through the pipelines in using a single basic configuration file, which can be changed for each assay without having to change the code itself. We also describe one implementation of TOGGLE in a complete analysis pipeline designed for SNP discovery for large sets of genomic data, ready to use in different environments (from a single machine to HPC clusters). CONCLUSION TOGGLE speeds up the creation of robust pipelines with reliable log tracking and data flow, for a large range of analyses. Moreover, it enables Biologists to concentrate on the biological relevance of results, and change the experimental conditions easily. The whole code and test data are available at https://github.com/SouthGreenPlatform/TOGGLE .
Collapse
Affiliation(s)
- Cécile Monat
- UMR DIADE IRD/UM, 911 Avenue Agropolis, Montpellier Cedex 5, F-34934, France.
| | | | - Ayité Kougbeadjo
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | - Cédric Farcy
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | - Enrique Ortega-Abboud
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | | | - Sébastien Ravel
- UMR-BGPI CIRAD TA A-54/K, Campus International de Baillarguet, Montpellier Cedex 5, F-34398, France.
| | - Mawussé Agbessi
- UMR DIADE IRD/UM, 911 Avenue Agropolis, Montpellier Cedex 5, F-34934, France.
| | | | - Maryline Summo
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | - François Sabot
- UMR DIADE IRD/UM, 911 Avenue Agropolis, Montpellier Cedex 5, F-34934, France.
| |
Collapse
|
28
|
Kornobis E, Cabellos L, Aguilar F, Frías-López C, Rozas J, Marco J, Zardoya R. TRUFA: A User-Friendly Web Server for de novo RNA-seq Analysis Using Cluster Computing. Evol Bioinform Online 2015; 11:97-104. [PMID: 26056424 PMCID: PMC4444131 DOI: 10.4137/ebo.s23873] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Revised: 03/09/2015] [Accepted: 03/16/2015] [Indexed: 01/08/2023] Open
Abstract
Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has become increasingly accessible in recent years and are of great interest to many biological disciplines including, eg, evolutionary biology, ecology, biomedicine, and computational biology. Although virtually any research group can now obtain RNA-seq data, only a few have the bioinformatics knowledge and computation facilities required for transcriptome analysis. Here, we present TRUFA (TRanscriptome User-Friendly Analysis), an open informatics platform offering a web-based interface that generates the outputs commonly used in de novo RNA-seq analysis and comparative transcriptomics. TRUFA provides a comprehensive service that allows performing dynamically raw read cleaning, transcript assembly, annotation, and expression quantification. Due to the computationally intensive nature of such analyses, TRUFA is highly parallelized and benefits from accessing high-performance computing resources. The complete TRUFA pipeline was validated using four previously published transcriptomic data sets. TRUFA’s results for the example datasets showed globally similar results when comparing with the original studies, and performed particularly better when analyzing the green tea dataset. The platform permits analyzing RNA-seq data in a fast, robust, and user-friendly manner. Accounts on TRUFA are provided freely upon request at https://trufa.ifca.es.
Collapse
Affiliation(s)
- Etienne Kornobis
- Departamento de biodiversidad y biología evolutiva, Museo Nacional de Ciencias Naturales MNCN (CSIC), Madrid, Spain
| | - Luis Cabellos
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Fernando Aguilar
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Cristina Frías-López
- Departament de Genètica and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Jesús Marco
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Rafael Zardoya
- Departamento de biodiversidad y biología evolutiva, Museo Nacional de Ciencias Naturales MNCN (CSIC), Madrid, Spain
| |
Collapse
|
29
|
Evaluation of composition and individual variability of rumen microbiota in yaks by 16S rRNA high-throughput sequencing technology. Anaerobe 2015; 34:74-9. [PMID: 25911445 DOI: 10.1016/j.anaerobe.2015.04.010] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 04/13/2015] [Accepted: 04/19/2015] [Indexed: 12/26/2022]
Abstract
The Yak (Bos grunniens) is a unique species of ruminant animals that is important to agriculture of the Tibetan plateau, and has a complex intestinal microbial community. The objective of the present study was to characterize the composition and individual variability of microbiota in the rumen of yaks using 16S rRNA gene high-throughput sequencing technique. Rumen samples used in the present study were obtained from grazing adult male yaks (n = 6) in a commercial farm in Ganzi Autonomous Prefecture of Sichuan Province, China. Universal prokaryote primers were used to target the V4-V5 hypervariable region of 16S rRNA gene. A total of 7200 operational taxonomic units (OTUs) were obtained after sequence filtering and chimera removal. Within these OTUs, 0.56% belonged to Archaea (40 OTUs), 7.19% to unassigned species (518 OTUs), and the remaining OTUs (6642) in all samples were of bacterial origin. When examining the community structure of bacteria, we identified 23 phyla within 159 families after taxonomic summarization. Bacteroidetes and Firmicutes were the predominant phyla accounting for 39.68% (SD = 0.05) and 45.90% (SD = 0.06), respectively. Moreover, 3764 OTUs were identified as shared OTUs (i.e. represented in all yaks) and belonged to 35 genera, exhibiting highly variable abundance across individual samples. Phylogenetic placement of these genera across individual samples was examined. In addition, we evaluated the distance among the 6 rumen samples by adding taxon phylogeny using UniFrac, representing 24.1% of average distance. In summary, the current study reveals a shared rumen microbiome and phylogenetic lineage and presents novel information on composition and individual variability of the bacterial community in the rumen of yaks.
Collapse
|
30
|
Manconi A, Manca E, Moscatelli M, Gnocchi M, Orro A, Armano G, Milanesi L. G-CNV: A GPU-Based Tool for Preparing Data to Detect CNVs with Read-Depth Methods. Front Bioeng Biotechnol 2015; 3:28. [PMID: 25806367 PMCID: PMC4354384 DOI: 10.3389/fbioe.2015.00028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 02/19/2015] [Indexed: 11/23/2022] Open
Abstract
Copy number variations (CNVs) are the most prevalent types of structural variations (SVs) in the human genome and are involved in a wide range of common human diseases. Different computational methods have been devised to detect this type of SVs and to study how they are implicated in human diseases. Recently, computational methods based on high-throughput sequencing (HTS) are increasingly used. The majority of these methods focus on mapping short-read sequences generated from a donor against a reference genome to detect signatures distinctive of CNVs. In particular, read-depth based methods detect CNVs by analyzing genomic regions with significantly different read-depth from the other ones. The pipeline analysis of these methods consists of four main stages: (i) data preparation, (ii) data normalization, (iii) CNV regions identification, and (iv) copy number estimation. However, available tools do not support most of the operations required at the first two stages of this pipeline. Typically, they start the analysis by building the read-depth signal from pre-processed alignments. Therefore, third-party tools must be used to perform most of the preliminary operations required to build the read-depth signal. These data-intensive operations can be efficiently parallelized on graphics processing units (GPUs). In this article, we present G-CNV, a GPU-based tool devised to perform the common operations required at the first two stages of the analysis pipeline. G-CNV is able to filter low-quality read sequences, to mask low-quality nucleotides, to remove adapter sequences, to remove duplicated read sequences, to map the short-reads, to resolve multiple mapping ambiguities, to build the read-depth signal, and to normalize it. G-CNV can be efficiently used as a third-party tool able to prepare data for the subsequent read-depth signal generation and analysis. Moreover, it can also be integrated in CNV detection tools to generate read-depth signals.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Emanuele Manca
- Department of Electrical and Electronic Engineering, University of Cagliari , Cagliari , Italy
| | - Marco Moscatelli
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Matteo Gnocchi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Alessandro Orro
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| | - Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari , Cagliari , Italy
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council , Milan , Italy
| |
Collapse
|
31
|
Giannoulatou E, Park SH, Humphreys DT, Ho JWK. Verification and validation of bioinformatics software without a gold standard: a case study of BWA and Bowtie. BMC Bioinformatics 2014; 15 Suppl 16:S15. [PMID: 25521810 PMCID: PMC4290646 DOI: 10.1186/1471-2105-15-s16-s15] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Bioinformatics software quality assurance is essential in genomic medicine. Systematic verification and validation of bioinformatics software is difficult because it is often not possible to obtain a realistic "gold standard" for systematic evaluation. Here we apply a technique that originates from the software testing literature, namely Metamorphic Testing (MT), to systematically test three widely used short-read sequence alignment programs. Results MT alleviates the problems associated with the lack of gold standard by checking that the results from multiple executions of a program satisfy a set of expected or desirable properties that can be derived from the software specification or user expectations. We tested BWA, Bowtie and Bowtie2 using simulated data and one HapMap dataset. It is interesting to observe that multiple executions of the same aligner using slightly modified input FASTQ sequence file, such as after randomly re-ordering of the reads, may affect alignment results. Furthermore, we found that the list of variant calls can be affected unless strict quality control is applied during variant calling. Conclusion Thorough testing of bioinformatics software is important in delivering clinical genomic medicine. This paper demonstrates a different framework to test a program that involves checking its properties, thus greatly expanding the number and repertoire of test cases we can apply in practice.
Collapse
|
32
|
Kumar P, Al-Shafai M, Al Muftah WA, Chalhoub N, Elsaid MF, Aleem AA, Suhre K. Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance. BMC Res Notes 2014; 7:747. [PMID: 25339461 PMCID: PMC4216909 DOI: 10.1186/1756-0500-7-747] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 10/03/2014] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms. RESULTS Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype. CONCLUSION Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Karsten Suhre
- Weill Cornell Medical College in Qatar, Education City, Doha, Qatar.
| |
Collapse
|
33
|
Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and annotation. Evol Appl 2014; 7:1026-42. [PMID: 25553065 PMCID: PMC4231593 DOI: 10.1111/eva.12178] [Citation(s) in RCA: 194] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/20/2014] [Indexed: 12/12/2022] Open
Abstract
Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects.
Collapse
Affiliation(s)
- Robert Ekblom
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| | - Jochen B W Wolf
- Department of Evolutionary Biology, Uppsala University Uppsala, Sweden
| |
Collapse
|
34
|
Hwang KB, Lee IH, Park JH, Hambuch T, Choe Y, Kim M, Lee K, Song T, Neu MB, Gupta N, Kohane IS, Green RC, Kong SW. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods. Hum Mutat 2014; 35:936-44. [PMID: 24829188 DOI: 10.1002/humu.22587] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 04/29/2014] [Indexed: 12/29/2022]
Abstract
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates.
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Boston Children's Hospital, Boston, Massachusetts; School of Computer Science and Engineering, Soongsil University, Seoul, 156-743, South Korea
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Manconi A, Orro A, Manca E, Armano G, Milanesi L. GPU-BSM: a GPU-based tool to map bisulfite-treated reads. PLoS One 2014; 9:e97277. [PMID: 24842718 PMCID: PMC4026317 DOI: 10.1371/journal.pone.0097277] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2013] [Accepted: 04/17/2014] [Indexed: 11/18/2022] Open
Abstract
Cytosine DNA methylation is an epigenetic mark implicated in several biological processes. Bisulfite treatment of DNA is acknowledged as the gold standard technique to study methylation. This technique introduces changes in the genomic DNA by converting cytosines to uracils while 5-methylcytosines remain nonreactive. During PCR amplification 5-methylcytosines are amplified as cytosine, whereas uracils and thymines as thymine. To detect the methylation levels, reads treated with the bisulfite must be aligned against a reference genome. Mapping these reads to a reference genome represents a significant computational challenge mainly due to the increased search space and the loss of information introduced by the treatment. To deal with this computational challenge we devised GPU-BSM, a tool based on modern Graphics Processing Units. Graphics Processing Units are hardware accelerators that are increasingly being used successfully to accelerate general-purpose scientific applications. GPU-BSM is a tool able to map bisulfite-treated reads from whole genome bisulfite sequencing and reduced representation bisulfite sequencing, and to estimate methylation levels, with the goal of detecting methylation. Due to the massive parallelization obtained by exploiting graphics cards, GPU-BSM aligns bisulfite-treated reads faster than other cutting-edge solutions, while outperforming most of them in terms of unique mapped reads.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute for Biomedical Technologies, National Research Council, Segrate (Mi), Italy
- * E-mail:
| | - Alessandro Orro
- Institute for Biomedical Technologies, National Research Council, Segrate (Mi), Italy
| | - Emanuele Manca
- Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari (Ca), Italy
| | - Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari, Cagliari (Ca), Italy
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council, Segrate (Mi), Italy
| |
Collapse
|
36
|
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data. BMC Genomics 2014; 15:264. [PMID: 24708189 PMCID: PMC4051166 DOI: 10.1186/1471-2164-15-264] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 04/01/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. RESULTS In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. CONCLUSIONS A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.
Collapse
|
37
|
Amaral AJ, Brito FF, Chobanyan T, Yoshikawa S, Yokokura T, Van Vactor D, Gama-Carvalho M. Quality assessment and control of tissue specific RNA-seq libraries of Drosophila transgenic RNAi models. Front Genet 2014; 5:43. [PMID: 24634674 PMCID: PMC3942661 DOI: 10.3389/fgene.2014.00043] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 02/07/2014] [Indexed: 11/29/2022] Open
Abstract
RNA-sequencing (RNA-seq) is rapidly emerging as the technology of choice for whole-transcriptome studies. However, RNA-seq is not a bias free technique. It requires large amounts of RNA and library preparation can introduce multiple artifacts, compounded by problems from later stages in the process. Nevertheless, RNA-seq is increasingly used in multiple studies, including the characterization of tissue-specific transcriptomes from invertebrate models of human disease. The generation of samples in this context is complex, involving the establishment of mutant strains and the delicate contamination prone process of dissecting the target tissue. Moreover, in order to achieve the required amount of RNA, multiple samples need to be pooled. Such datasets pose extra challenges due to the large variability that may occur between similar pools, mostly due to the presence of cells from surrounding tissues. Therefore, in addition to standard quality control of RNA-seq data, analytical procedures for control of “biological quality” are critical for successful comparison of gene expression profiles. In this study, the transcriptome of the central nervous system (CNS) of a Drosophila transgenic strain with neuronal-specific RNAi of an ubiquitous gene was profiled using RNA-seq. After observing the existence of an unusual variance in our dataset, we showed that the expression profile of a small panel of marker genes, including GAL4 under control of a tissue specific driver, can identify libraries with low levels of contamination from neighboring tissues, enabling the selection of a robust dataset for differential expression analysis. We further analyzed the potential of profiling a complex tissue to identify cell-type specific changes in response to target gene down-regulation. Finally, we showed that trimming 5′ ends of reads decreases nucleotide frequency biases, increasing the coverage of protein coding genes with a potential positive impact in the incurrence of systematic technical errors.
Collapse
Affiliation(s)
- Andreia J Amaral
- Universidade de Lisboa, Faculdade de Ciências, BioFIG-Centre for Biodiversity, Functional and Integrative Genomics Lisbon, Portugal ; Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa Lisbon, Portugal
| | - Francisco F Brito
- Universidade de Lisboa, Faculdade de Ciências, BioFIG-Centre for Biodiversity, Functional and Integrative Genomics Lisbon, Portugal
| | - Tamar Chobanyan
- Formation and Regulation of Neuronal Connectivity Research Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan
| | - Seiko Yoshikawa
- Formation and Regulation of Neuronal Connectivity Research Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan
| | - Takakazu Yokokura
- Formation and Regulation of Neuronal Connectivity Research Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan
| | - David Van Vactor
- Formation and Regulation of Neuronal Connectivity Research Unit, Okinawa Institute of Science and Technology Graduate University Okinawa, Japan ; Department of Cell Biology, Harvard Medical School Boston, USA
| | - Margarida Gama-Carvalho
- Universidade de Lisboa, Faculdade de Ciências, BioFIG-Centre for Biodiversity, Functional and Integrative Genomics Lisbon, Portugal
| |
Collapse
|
38
|
Manconi A, Orro A, Manca E, Armano G, Milanesi L. A tool for mapping Single Nucleotide Polymorphisms using Graphics Processing Units. BMC Bioinformatics 2014; 15 Suppl 1:S10. [PMID: 24564714 PMCID: PMC4015528 DOI: 10.1186/1471-2105-15-s1-s10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Background Single Nucleotide Polymorphism (SNP) genotyping analysis is very susceptible to SNPs chromosomal position errors. As it is known, SNPs mapping data are provided along the SNP arrays without any necessary information to assess in advance their accuracy. Moreover, these mapping data are related to a given build of a genome and need to be updated when a new build is available. As a consequence, researchers often plan to remap SNPs with the aim to obtain more up-to-date SNPs chromosomal positions. In this work, we present G-SNPM a GPU (Graphics Processing Unit) based tool to map SNPs on a genome. Methods G-SNPM is a tool that maps a short sequence representative of a SNP against a reference DNA sequence in order to find the physical position of the SNP in that sequence. In G-SNPM each SNP is mapped on its related chromosome by means of an automatic three-stage pipeline. In the first stage, G-SNPM uses the GPU-based short-read mapping tool SOAP3-dp to parallel align on a reference chromosome its related sequences representative of a SNP. In the second stage G-SNPM uses another short-read mapping tool to remap the sequences unaligned or ambiguously aligned by SOAP3-dp (in this stage SHRiMP2 is used, which exploits specialized vector computing hardware to speed-up the dynamic programming algorithm of Smith-Waterman). In the last stage, G-SNPM analyzes the alignments obtained by SOAP3-dp and SHRiMP2 to identify the absolute position of each SNP. Results and conclusions To assess G-SNPM, we used it to remap the SNPs of some commercial chips. Experimental results shown that G-SNPM has been able to remap without ambiguity almost all SNPs. Based on modern GPUs, G-SNPM provides fast mappings without worsening the accuracy of the results. G-SNPM can be used to deal with specialized Genome Wide Association Studies (GWAS), as well as in annotation tasks that require to update the SNP mapping probes.
Collapse
|
39
|
Konczal M, Koteja P, Stuglik MT, Radwan J, Babik W. Accuracy of allele frequency estimation using pooled RNA-Seq. Mol Ecol Resour 2013; 14:381-92. [PMID: 24119300 DOI: 10.1111/1755-0998.12186] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 09/30/2013] [Accepted: 10/06/2013] [Indexed: 11/28/2022]
Abstract
For nonmodel organisms, genome-wide information that describes functionally relevant variation may be obtained by RNA-Seq following de novo transcriptome assembly. While sequencing has become relatively inexpensive, the preparation of a large number of sequencing libraries remains prohibitively expensive for population genetic analyses of nonmodel species. Pooling samples may be then an attractive alternative. To test whether pooled RNA-Seq accurately predicts true allele frequencies, we analysed the liver transcriptomes of 10 bank voles. Each sample was sequenced both as an individually barcoded library and as a part of a pool. Equal amounts of total RNA from each vole were pooled prior to mRNA selection and library construction. Reads were mapped onto the de novo assembled reference transcriptome. High-quality genotypes for individual voles, determined for 23,682 SNPs, provided information on 'true' allele frequencies; allele frequencies estimated from the pool were then compared with these values. 'True' frequencies and those estimated from the pool were highly correlated. Mean relative estimation error was 21% and did not depend on expression level. However, we also observed a minor effect of interindividual variation in gene expression and allele-specific gene expression influencing allele frequency estimation accuracy. Moreover, we observed strong negative relationship between minor allele frequency and relative estimation error. Our results indicate that pooled RNA-Seq exhibits accuracy comparable with pooled genome resequencing, but variation in expression level between individuals should be assessed and accounted for. This should help in taking account the difference in accuracy between conservatively expressed transcripts and these which are variable in expression level.
Collapse
Affiliation(s)
- M Konczal
- Institute of Environmental Sciences, Jagiellonian University, Gronostajowa 7, 30-387, Kraków, Poland
| | | | | | | | | |
Collapse
|
40
|
Roy S, Durso MB, Wald A, Nikiforov YE, Nikiforova MN. SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory. J Mol Diagn 2013; 16:11-22. [PMID: 24220144 DOI: 10.1016/j.jmoldx.2013.08.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Revised: 08/20/2013] [Accepted: 08/27/2013] [Indexed: 01/15/2023] Open
Abstract
A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution.
Collapse
Affiliation(s)
- Somak Roy
- Division of Molecular and Genomic Pathology, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania.
| | - Mary Beth Durso
- Division of Molecular and Genomic Pathology, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Abigail Wald
- Division of Molecular and Genomic Pathology, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Yuri E Nikiforov
- Division of Molecular and Genomic Pathology, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Marina N Nikiforova
- Division of Molecular and Genomic Pathology, Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania.
| |
Collapse
|
41
|
Prosperi MCF, Yin L, Nolan DJ, Lowe AD, Goodenow MM, Salemi M. Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges. Sci Rep 2013; 3:2837. [PMID: 24089188 PMCID: PMC3789152 DOI: 10.1038/srep02837] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 09/13/2013] [Indexed: 11/22/2022] Open
Abstract
Next generation sequencing (NGS) is superseding Sanger technology for analysing intra-host viral populations, in terms of genome length and resolution. We introduce two new empirical validation data sets and test the available viral population assembly software. Two intra-host viral population 'quasispecies' samples (type-1 human immunodeficiency and hepatitis C virus) were Sanger-sequenced, and plasmid clone mixtures at controlled proportions were shotgun-sequenced using Roche's 454 sequencing platform. The performance of different assemblers was compared in terms of phylogenetic clustering and recombination with the Sanger clones. Phylogenetic clustering showed that all assemblers captured a proportion of the most divergent lineages, but none were able to provide a high precision/recall tradeoff. Estimated variant frequencies mildly correlated with the original. Given the limitations of currently available algorithms identified by our empirical validation, the development and exploitation of additional data sets is needed, in order to establish an efficient framework for viral population reconstruction using NGS.
Collapse
Affiliation(s)
- Mattia C. F. Prosperi
- University of Manchester, Faculty of Medical and Human Sciences, Northwest Institute of Bio-Health Informatics, Centre for Health Informatics, Institute of Population Health, Manchester, UK
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
| | - Li Yin
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
| | - David J. Nolan
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
| | - Amanda D. Lowe
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
| | - Maureen M. Goodenow
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
| | - Marco Salemi
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
- Emerging Pathogens Institute, Gainesville, Florida, USA
| |
Collapse
|
42
|
Liu X, Han S, Wang Z, Gelernter J, Yang BZ. Variant callers for next-generation sequencing data: a comparison study. PLoS One 2013; 8:e75619. [PMID: 24086590 PMCID: PMC3785481 DOI: 10.1371/journal.pone.0075619] [Citation(s) in RCA: 112] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 08/15/2013] [Indexed: 11/19/2022] Open
Abstract
Next generation sequencing (NGS) has been leading the genetic study of human disease into an era of unprecedented productivity. Many bioinformatics pipelines have been developed to call variants from NGS data. The performance of these pipelines depends crucially on the variant caller used and on the calling strategies implemented. We studied the performance of four prevailing callers, SAMtools, GATK, glftools and Atlas2, using single-sample and multiple-sample variant-calling strategies. Using the same aligner, BWA, we built four single-sample and three multiple-sample calling pipelines and applied the pipelines to whole exome sequencing data taken from 20 individuals. We obtained genotypes generated by Illumina Infinium HumanExome v1.1 Beadchip for validation analysis and then used Sanger sequencing as a “gold-standard” method to resolve discrepancies for selected regions of high discordance. Finally, we compared the sensitivity of three of the single-sample calling pipelines using known simulated whole genome sequence data as a gold standard. Overall, for single-sample calling, the called variants were highly consistent across callers and the pairwise overlapping rate was about 0.9. Compared with other callers, GATK had the highest rediscovery rate (0.9969) and specificity (0.99996), and the Ti/Tv ratio out of GATK was closest to the expected value of 3.02. Multiple-sample calling increased the sensitivity. Results from the simulated data suggested that GATK outperformed SAMtools and glfSingle in sensitivity, especially for low coverage data. Further, for the selected discrepant regions evaluated by Sanger sequencing, variant genotypes called by exome sequencing versus the exome array were more accurate, although the average variant sensitivity and overall genotype consistency rate were as high as 95.87% and 99.82%, respectively. In conclusion, GATK showed several advantages over other variant callers for general purpose NGS analyses. The GATK pipelines we developed perform very well.
Collapse
Affiliation(s)
- Xiangtao Liu
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
- VA CT Health Care Center, West Haven, Connecticut, United States of America
| | - Shizhong Han
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
- VA CT Health Care Center, West Haven, Connecticut, United States of America
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Joel Gelernter
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
- VA CT Health Care Center, West Haven, Connecticut, United States of America
- Departments of Genetics and Neurobiology, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Bao-Zhu Yang
- Department of Psychiatry, Division of Human Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
- VA CT Health Care Center, West Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
43
|
Wang Z, Liu X, Yang BZ, Gelernter J. The role and challenges of exome sequencing in studies of human diseases. Front Genet 2013; 4:160. [PMID: 24032039 PMCID: PMC3752524 DOI: 10.3389/fgene.2013.00160] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 08/04/2013] [Indexed: 01/19/2023] Open
Abstract
Recent advances in next-generation sequencing technologies have transformed the genetics study of human diseases; this is an era of unprecedented productivity. Exome sequencing, the targeted sequencing of the protein-coding portion of the human genome, has been shown to be a powerful and cost-effective method for detection of disease variants underlying Mendelian disorders. Increasing effort has been made in the interest of the identification of rare variants associated with complex traits in sequencing studies. Here we provided an overview of the application fields for exome sequencing in human diseases. We describe a general framework of computation and bioinformatics for handling sequencing data. We then demonstrate data quality and agreement between exome sequencing and exome microarray (chip) genotypes using data collected on the same set of subjects in a genetic study of panic disorder. Our results show that, in sequencing data, the data quality was generally higher for variants within the exonic target regions, compared to that outside the target regions, due to the target enrichment. We also compared genotype concordance for variant calls obtained by exome sequencing vs. exome genotyping microarrays. The overall consistency rate was >99.83% and the heterozygous consistency rate was >97.55%. The two platforms share a large amount of agreement over low frequency variants in the exonic regions, while exome sequencing provides much more information on variants not included on exome genotyping microarrays. The results demonstrate that exome sequencing data are of high quality and can be used to investigate the role of rare coding variants in human diseases.
Collapse
Affiliation(s)
- Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, Yale University New Haven, CT, USA
| | | | | | | |
Collapse
|
44
|
Ulahannan D, Kovac MB, Mulholland PJ, Cazier JB, Tomlinson I. Technical and implementation issues in using next-generation sequencing of cancers in clinical practice. Br J Cancer 2013; 109:827-35. [PMID: 23887607 PMCID: PMC3749581 DOI: 10.1038/bjc.2013.416] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 04/23/2013] [Accepted: 06/27/2013] [Indexed: 12/13/2022] Open
Abstract
Next-generation sequencing (NGS) of cancer genomes promises to revolutionise oncology, with the ability to design and use targeted drugs, to predict outcome and response, and to classify tumours. It is continually becoming cheaper, faster and more reliable, with the capability to identify rare yet clinically important somatic mutations. Technical challenges include sequencing samples of low quality and/or quantity, reliable identification of structural and copy number variation, and assessment of intratumour heterogeneity. Once these problems are overcome, the use of the data to guide clinical decision making is not straightforward, and there is a risk of premature use of molecular changes to guide patient management in the absence of supporting evidence. Paradoxically, NGS may simply move the bottleneck of personalised medicine from data acquisition to the identification of reliable biomarkers. Standardised cancer NGS data collection on an international scale would be a significant step towards optimising patient care.
Collapse
Affiliation(s)
- D Ulahannan
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.
| | | | | | | | | |
Collapse
|
45
|
Bromberg Y. Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 2013; 425:3993-4005. [PMID: 23928561 DOI: 10.1016/j.jmb.2013.07.038] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 07/26/2013] [Accepted: 07/28/2013] [Indexed: 12/24/2022]
Abstract
Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, transcriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice.
Collapse
Affiliation(s)
- Y Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08873, USA.
| |
Collapse
|
46
|
Gardner SN, Jaing CJ. Bioinformatics for microbial genotyping of equine encephalitis viruses, orthopoxviruses, and hantaviruses. J Virol Methods 2013; 193:112-20. [PMID: 23714768 DOI: 10.1016/j.jviromet.2013.04.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Revised: 04/18/2013] [Accepted: 04/29/2013] [Indexed: 10/26/2022]
Abstract
Microbial genotyping is essential for forensic discrimination of pathogen strains, tracing epidemics, and understanding evolutionary processes. Phylogenetic analyses were performed and genotyping assays designed for five viral species complexes or genera: Western, Eastern, and Venezuelan equine encephalitis viruses, hantavirus segments L, M, and S, and orthopoxviruses. For each group, sequence alignments and phylogenetic trees were built. PCR signatures composed of primer pairs or TaqMan™ triplets were designed and mapped to nodes of the trees for sub-type or strain specific PCR-based identification. In addition, single nucleotide polymorphisms (SNPs) were identified and mapped to trees, and SNP microarray probes were designed to enable highly multiplexed genotyping of an unsequenced sample by hybridization. SNP-based trees corresponded well with MSA trees. Near-perfect isolate resolution was possible for all viruses analyzed computationally using either SNPs or PCR signatures. More tree nodes were represented by SNP loci than by PCR signatures, as PCR signatures often represented subsets of strains not corresponding to a branch. However, while PCR genotyping is possible, the number of PCR signatures needed to characterize an unknown can be very large. SNP microarrays are a suitable alternative, as arrays enable highly multiplexed, high resolution genotyping of an unknown in a single hybridization assay.
Collapse
Affiliation(s)
- Shea N Gardner
- Global Security, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States.
| | | |
Collapse
|
47
|
Wolf JBW. Principles of transcriptome analysis and gene expression quantification: an
RNA
‐seq tutorial. Mol Ecol Resour 2013; 13:559-72. [DOI: 10.1111/1755-0998.12109] [Citation(s) in RCA: 139] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2013] [Revised: 03/18/2013] [Accepted: 03/21/2013] [Indexed: 01/03/2023]
Affiliation(s)
- Jochen B. W. Wolf
- Department of Evolutionary Biology Uppsala University Uppsala Sweden
- Science of Life Laboratory Uppsala Sweden
| |
Collapse
|
48
|
Kilpivaara O, Aaltonen LA. Diagnostic cancer genome sequencing and the contribution of germline variants. Science 2013; 339:1559-62. [PMID: 23539595 DOI: 10.1126/science.1233899] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Whole-genome sequencing (WGS) is revolutionizing medical research and has the potential to serve as a powerful and cost-effective diagnostic tool in the management of cancer. We review the progress to date in the use of WGS to reveal how germline variants and mutations may be associated with cancer. We use colorectal cancer as an example of how the current level of knowledge can be translated into predictions of predisposition. We also address challenges in the clinical implementation of the variants in germline DNA identified through cancer genome sequencing. We call for the international development of standards to facilitate the clinical use of germline information arising from diagnostic cancer genome sequencing.
Collapse
Affiliation(s)
- O Kilpivaara
- Department of Medical Genetics, Biomedicum Helsinki, University of Helsinki, Helsinki, Finland
| | | |
Collapse
|
49
|
Gullapalli RR, Lyons-Weiler M, Petrosko P, Dhir R, Becich MJ, LaFramboise WA. Clinical integration of next-generation sequencing technology. Clin Lab Med 2013; 32:585-99. [PMID: 23078661 DOI: 10.1016/j.cll.2012.07.005] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Recent advances in next-generation sequencing (NGS) methods and technology have substantially reduced costs and operational complexity leading to production of benchtop sequencers and commercial software solutions for implementation in small research and clinical laboratories. This article addresses requirements and limitations to successful implementation of these systems, including (1) calibration and validation of the instrumentation, experimental paradigm, and primary readout, (2) secure data transfer, storage, and secondary processing, (3) implementation of software tools for targeted analysis, and (4) training of research and clinical personnel to evaluate data fidelity and interpret the molecular significance of the genomic output.
Collapse
Affiliation(s)
- R R Gullapalli
- Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | | | | | | | | | | |
Collapse
|
50
|
Frese KS, Katus HA, Meder B. Next-generation sequencing: from understanding biology to personalized medicine. BIOLOGY 2013; 2:378-98. [PMID: 24832667 PMCID: PMC4009863 DOI: 10.3390/biology2010378] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Revised: 01/21/2013] [Accepted: 02/04/2013] [Indexed: 12/11/2022]
Abstract
Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care.
Collapse
Affiliation(s)
- Karen S Frese
- Department of Internal Medicine III, University of Heidelberg, Heidelberg 69120, Germany.
| | - Hugo A Katus
- Department of Internal Medicine III, University of Heidelberg, Heidelberg 69120, Germany.
| | - Benjamin Meder
- Department of Internal Medicine III, University of Heidelberg, Heidelberg 69120, Germany.
| |
Collapse
|