1
|
Mencius J, Chen W, Zheng Y, An T, Yu Y, Sun K, Feng H, Feng Z. Restoring flowcell type and basecaller configuration from FASTQ files of nanopore sequencing data. Nat Commun 2025; 16:4102. [PMID: 40316544 PMCID: PMC12048652 DOI: 10.1038/s41467-025-59378-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 04/22/2025] [Indexed: 05/04/2025] Open
Abstract
As nanopore sequencing has been widely adopted, data accumulation has surged, resulting in over 700,000 public datasets. While these data hold immense potential for advancing genomic research, their utility is compromised by the absence of flowcell type and basecaller configuration in about 85% of the data and associated publications. These parameters are essential for many analysis algorithms, and their misapplication can lead to significant drops in performance. To address this issue, we present LongBow, designed to infer flowcell type and basecaller configuration directly from the base quality value patterns of FASTQ files. LongBow has been tested on 66 in-house basecalled FAST5/POD5 datasets and 1989 public FASTQ datasets, achieving accuracies of 95.33% and 91.45%, respectively. We demonstrate its utility by reanalyzing nanopore sequencing data from the COVID-19 Genomics UK (COG-UK) project. The results show that LongBow is essential for reproducing reported genomic variants and, through a LongBow-based analysis pipeline, we discovered substantially more functionally important variants while improving accuracy in lineage assignment. Overall, LongBow is poised to play a critical role in maximizing the utility of public nanopore sequencing data, while significantly enhancing the reproducibility of related research.
Collapse
Affiliation(s)
- Jun Mencius
- Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Wenjun Chen
- Department of Clinical Genetics, Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Youqi Zheng
- Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingyi An
- Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Yongguo Yu
- Department of Clinical Genetics, Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Kun Sun
- Department of Clinical Genetics, Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
- Department of Pediatric Cardiology, Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Huijuan Feng
- Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China.
| | - Zhixing Feng
- Department of Clinical Genetics, Xinhua Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
2
|
Lemas JM, Patterson EL, Cutti L, Morran S, Johnson NA, Montgomery J, Abdollahi F, Nelson DR, Llaca V, Fengler K, Westra P, Gaines TA. Assembly and Annotation of the Tetraploid Salsola tragus (Russian Thistle) Genome. Genome Biol Evol 2025; 17:evaf014. [PMID: 39862056 PMCID: PMC11797066 DOI: 10.1093/gbe/evaf014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 01/15/2025] [Accepted: 01/19/2025] [Indexed: 01/27/2025] Open
Abstract
This report presents two phased chromosome-scale genome assemblies of allotetraploid Salsola tragus (2n = 4x = 36) and fills the current genomics resource gap for this species. Flow cytometry estimated 1C genome size was 1.319 Gb. PacBio HiFi reads were assembled and scaffolded with Hi-C chromatin contact mapping and Bionano optical mapping data. For annotation, a PacBio Iso-Seq library was generated from root, stem, leaf, and floral tissues followed by annotation using a modified Maker pipeline. The assembled haploid S. tragus genomes contained 18 chromosomes each, with 9 chromosomes assigned to subgenome A and 9 chromosomes to subgenome B. Each haplome assembly represented 95% of the total flow cytometry estimated genome size. Haplome 1 and haplome 2 contained 43,354 and 42,221 annotated genes, respectively. The availability of high-quality reference genomes for this economically important weed will facilitate future omics analysis of S. tragus and a better understanding of chenopod plants.
Collapse
Affiliation(s)
- John M Lemas
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
- Department of Plant, Soil, and Microbial Science, Michigan State University, East Lansing, MI 48823, USA
| | - Eric L Patterson
- Department of Plant, Soil, and Microbial Science, Michigan State University, East Lansing, MI 48823, USA
| | - Luan Cutti
- Department of Plant, Soil, and Microbial Science, Michigan State University, East Lansing, MI 48823, USA
| | - Sarah Morran
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Nicholas A Johnson
- Department of Plant, Soil, and Microbial Science, Michigan State University, East Lansing, MI 48823, USA
| | - Jacob Montgomery
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
- Department of Plant, Soil, and Microbial Science, Michigan State University, East Lansing, MI 48823, USA
| | - Fatemeh Abdollahi
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - David R Nelson
- Department of Microbiology, Immunology, and Biochemistry, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Victor Llaca
- Corteva Agriscience, Genomics Lab, Johnston, IA 50131, USA
| | - Kevin Fengler
- Corteva Agriscience, Genomics Lab, Johnston, IA 50131, USA
| | - Philip Westra
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Todd A Gaines
- Department of Agricultural Biology, Colorado State University, Fort Collins, CO 80523, USA
| |
Collapse
|
3
|
Kanu GA, Mouselly A, Mohamed AA. Foundations and applications of computational genomics. DEEP LEARNING IN GENETICS AND GENOMICS 2025:59-75. [DOI: 10.1016/b978-0-443-27574-6.00007-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
4
|
Lorenzini Campos MN, Amadio AF, Irazoqui JM, Acevedo RM, Rojas FD, Corredor Sanguña LH, Formichelli LB, Lucero RH, Giusiano GE. Applying nanopore sequencing technology in Paracoccidioides sp.: a high-quality DNA isolation method for next-generation genomic studies. Microb Genom 2024; 10:001302. [PMID: 39432409 PMCID: PMC11493184 DOI: 10.1099/mgen.0.001302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 09/10/2024] [Indexed: 10/23/2024] Open
Abstract
Paracoccidioidomycosis is a severe systemic endemic mycosis caused by Paracoccidioides spp. which mainly affects individuals in Latin America. Progress in Paracoccidioides genomics has been slow, as evidenced by the incomplete reference databases available. Next-generation sequencing is a valuable tool for epidemiological surveillance and genomic characterization. With the ability to sequence long reads without the need for prior amplification, Oxford Nanopore Technology (ONT) offers several advantages, but high-quality and high-quantity DNA samples are required to achieve satisfactory results. Due to the low concentration of Paracoccidioides DNA in clinical samples and inefficient culture isolation methods, DNA extraction can be a significant barrier to genomic studies of this genus. This study proposes a method to obtain a high-coverage de novo genome assembly for Paracoccidioides using an improved DNA extraction method suitable for sequencing with ONT. The assembly obtained was comparable in size to those constructed from available data from Illumina technology. To our knowledge, this is the first genome assembly of Paracoccidioides sp. of such a large size constructed using ONT.
Collapse
Affiliation(s)
- Melina Noelia Lorenzini Campos
- Instituto de Medicina Regional (IMR), Universidad Nacional del Nordeste (UNNE), Av. Las Heras 727, (3500) Resistencia, Chaco, Argentina
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
| | - Ariel Fernando Amadio
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
- Instituto de Investigación de la Cadena Láctea (IDICAL), Instituto Nacional de Tecnología Agropecuaria (INTA), Ruta 34 km 227, (2300) Rafaela, Santa Fe, Argentina
| | - José Matías Irazoqui
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
- Instituto de Investigación de la Cadena Láctea (IDICAL), Instituto Nacional de Tecnología Agropecuaria (INTA), Ruta 34 km 227, (2300) Rafaela, Santa Fe, Argentina
| | - Raúl Maximiliano Acevedo
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
- Instituto de Botánica del Nordeste (IBONE, CONICET-UNNE), Universidad Nacional del Nordeste (UNNE), Sargento Juan Bautista Cabral 2131, (3402BKG) Corrientes capital, Argentina
| | - Florencia Dinorah Rojas
- Instituto de Medicina Regional (IMR), Universidad Nacional del Nordeste (UNNE), Av. Las Heras 727, (3500) Resistencia, Chaco, Argentina
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
| | - Luis Hernando Corredor Sanguña
- Instituto de Medicina Regional (IMR), Universidad Nacional del Nordeste (UNNE), Av. Las Heras 727, (3500) Resistencia, Chaco, Argentina
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
| | - Laura Belén Formichelli
- Instituto de Medicina Regional (IMR), Universidad Nacional del Nordeste (UNNE), Av. Las Heras 727, (3500) Resistencia, Chaco, Argentina
| | - Raúl Horacio Lucero
- Instituto de Medicina Regional (IMR), Universidad Nacional del Nordeste (UNNE), Av. Las Heras 727, (3500) Resistencia, Chaco, Argentina
| | - Gustavo Emilio Giusiano
- Instituto de Medicina Regional (IMR), Universidad Nacional del Nordeste (UNNE), Av. Las Heras 727, (3500) Resistencia, Chaco, Argentina
- Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Godoy Cruz 2290, (C1425FQB) Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
5
|
Zhang H, Ko I, Eaker A, Haney S, Khuu N, Ryan K, Appleby AB, Hoffmann B, Landis H, Pierro KA, Willsea N, Hargarten H, Yocca AE, Harkess A, Honaas L, Ficklin S. A Haplotype-resolved, Chromosome-scale Genome for Malus domestica Borkh. 'WA 38'. G3 (BETHESDA, MD.) 2024; 14:jkae222. [PMID: 39288023 PMCID: PMC11631450 DOI: 10.1093/g3journal/jkae222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/26/2024] [Accepted: 09/13/2024] [Indexed: 09/19/2024]
Abstract
Genome sequencing for agriculturally important Rosaceous crops has made rapid progress both in completeness and annotation quality. Whole genome sequence and annotation gives breeders, researchers, and growers information about cultivar specific traits such as fruit quality and disease resistance, and informs strategies to enhance postharvest storage. Here we present a haplotype-phased, chromosomal level genome of Malus domestica, 'WA 38', a new apple cultivar released to market in 2017 as Cosmic Crisp®. Using both short and long read sequencing data with a k-mer based approach, chromosomes originating from each parent were assembled and segregated. This is the first pome fruit genome fully phased into parental haplotypes in which chromosomes from each parent are identified and separated into their unique, respective haplomes. The two haplome assemblies, 'Honeycrisp' originated HapA and 'Enterprise' originated HapB, are about 650 Megabases each, and both have a BUSCO score of 98.7% complete. A total of 53,028 and 54,235 genes were annotated from HapA and HapB, respectively. Additionally, we provide genome-scale comparisons to 'Gala', 'Honeycrisp', and other relevant cultivars highlighting major differences in genome structure and gene family circumscription. This assembly and annotation was done in collaboration with the American Campus Tree Genomes project that includes 'WA 38' (Washington State University), 'd'Anjou' pear (Auburn University), and many more. To ensure transparency, reproducibility, and applicability for any genome project, our genome assembly and annotation workflow is recorded in detail and shared under a public GitLab repository. All software is containerized, offering a simple implementation of the workflow.
Collapse
Affiliation(s)
- Huiting Zhang
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA
- Physiology and Pathology of Tree Fruits Research Unit, USDA Agricultural Research Service, Wenatchee, WA 98801, USA
| | - Itsuhiro Ko
- Department of Plant Pathology, Washington State University, Pullman, WA 99164, USA
- Program of Molecular Plant Sciences, Washington State University, Pullman, WA 99164, USA
| | - Abigail Eaker
- Department of Plant Pathology, Washington State University, Pullman, WA 99164, USA
- Program of Molecular Plant Sciences, Washington State University, Pullman, WA 99164, USA
| | - Sabrina Haney
- Department of Animal Science, Washington State University, Pullman, WA 99164, USA
| | - Ninh Khuu
- Department of Plant Pathology, Washington State University, Pullman, WA 99164, USA
| | - Kara Ryan
- The School of Biological Sciences, Washington State University, Pullman, WA 99164, USA
| | - Aaron B Appleby
- Department of Crop and Soil Science, Washington State University, Pullman, WA 99164, USA
| | - Brendan Hoffmann
- Integrated Plant Sciences Program, Washington State University, Pullman, WA 99164, USA
| | - Henry Landis
- The School of Biological Sciences, Washington State University, Pullman, WA 99164, USA
| | - Kenneth A Pierro
- Integrated Plant Sciences Program, Washington State University, Pullman, WA 99164, USA
| | - Noah Willsea
- Department of Horticulture, WSU Tree Fruit Research and Extension Center, Wenatchee, WA, 98801, USA
| | - Heidi Hargarten
- Physiology and Pathology of Tree Fruits Research Unit, USDA Agricultural Research Service, Wenatchee, WA 98801, USA
| | - Alan E Yocca
- Physiology and Pathology of Tree Fruits Research Unit, USDA Agricultural Research Service, Wenatchee, WA 98801, USA
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Alex Harkess
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Loren Honaas
- Physiology and Pathology of Tree Fruits Research Unit, USDA Agricultural Research Service, Wenatchee, WA 98801, USA
| | - Stephen Ficklin
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA
| |
Collapse
|
6
|
Komori K, Aoki K, Harada S, Ishii Y, Tateda K. Plasmid-mediated acquisition and chromosomal integration of blaCTX-M-14 in a subclade of Escherichia coli ST131- H30 clade C1. Antimicrob Agents Chemother 2024; 68:e0081724. [PMID: 39133024 PMCID: PMC11373201 DOI: 10.1128/aac.00817-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 07/19/2024] [Indexed: 08/13/2024] Open
Abstract
Escherichia coli ST131 is a multidrug-resistant lineage associated with the global spread of extended-spectrum β-lactamase-producing organisms. Particularly, ST131 clade C1 is the most predominant clade in Japan, harboring blaCTX-M-14 at a high frequency. However, the process of resistance gene acquisition and spread remains unclear. Here, we performed whole-genome sequencing of 19 E. coli strains belonging to 12 STs and 12 fimH types collected between 1997 and 2016. Additionally, we analyzed the full-length genome sequences of 96 ST131-H30 clade C0 and C1 strains, including those obtained from this study and those registered in public databases, to understand how ST131 clade C1 acquired and spread blaCTX-M-14. We detected conjugative IncFII plasmids and IncB/O/K/Z plasmids carrying blaCTX-M-14 in diverse genetic lineages of E. coli strains from the 1990s to the 2010s, suggesting that these plasmids played an important role in the spread of blaCTX-M-14. Molecular phylogenetic and molecular clock analyses of the 96 ST131-H30 clade C0 and C1 strains identified 8 subclades. Strains harboring blaCTX-M-14 were clustered in subclades 4 and 5, and it was inferred that clade C1 acquired blaCTX-M-14 around 1993. All 34 strains belonging to subclade 5 possessed blaCTX-M-14 with ISEcp1 upstream at the same chromosomal position, indicating their common ancestor acquired blaCTX-M-14 in a single ISEcp1-mediated transposition event during the early formation of the subclade around 1999. Therefore, both the horizontal transfer of plasmids carrying blaCTX-M-14 to diverse genetic lineages and chromosomal integration in the predominant genetic lineage have contributed to the spread of blaCTX-M-14.
Collapse
Affiliation(s)
- Kohji Komori
- Department of Microbiology and Infectious Diseases, Toho University Graduate School of Medicine, Tokyo, Japan
| | - Kotaro Aoki
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
| | - Sohei Harada
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
| | - Yoshikazu Ishii
- Department of Microbiology and Infectious Diseases, Toho University Graduate School of Medicine, Tokyo, Japan
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
- Center for the Planetary Health and Innovation Science (PHIS), The IDEC Institute, Hiroshima University, Higashi-Hiroshima, Japan
| | - Kazuhiro Tateda
- Department of Microbiology and Infectious Diseases, Toho University Graduate School of Medicine, Tokyo, Japan
- Department of Microbiology and Infectious Diseases, Toho University School of Medicine, Tokyo, Japan
| |
Collapse
|
7
|
Deng WJ, Li QQ, Shuai HN, Wu RX, Niu SF, Wang QH, Miao BB. Whole-Genome Sequencing Analyses Reveal the Evolution Mechanisms of Typical Biological Features of Decapterus maruadsi. Animals (Basel) 2024; 14:1202. [PMID: 38672351 PMCID: PMC11047736 DOI: 10.3390/ani14081202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 04/11/2024] [Accepted: 04/15/2024] [Indexed: 04/28/2024] Open
Abstract
Decapterus maruadsi is a typical representative of small pelagic fish characterized by fast growth rate, small body size, and high fecundity. It is a high-quality marine commercial fish with high nutritional value. However, the underlying genetics and genomics research focused on D. maruadsi is not comprehensive. Herein, a high-quality chromosome-level genome of a male D. maruadsi was assembled. The assembled genome length was 716.13 Mb with contig N50 of 19.70 Mb. Notably, we successfully anchored 95.73% contig sequences into 23 chromosomes with a total length of 685.54 Mb and a scaffold N50 of 30.77 Mb. A total of 22,716 protein-coding genes, 274.90 Mb repeat sequences, and 10,060 ncRNAs were predicted, among which 22,037 (97%) genes were successfully functionally annotated. The comparative genome analysis identified 459 unique, 73 expanded, and 52 contracted gene families. Moreover, 2804 genes were identified as candidates for positive selection, of which some that were related to the growth and development of bone, muscle, cardioid, and ovaries, such as some members of the TGF-β superfamily, were likely involved in the evolution of typical biological features in D. maruadsi. The study provides an accurate and complete chromosome-level reference genome for further genetic conservation, genomic-assisted breeding, and adaptive evolution research for D. maruadsi.
Collapse
Affiliation(s)
| | | | | | | | - Su-Fang Niu
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China; (W.-J.D.); (Q.-Q.L.); (H.-N.S.); (R.-X.W.); (Q.-H.W.); (B.-B.M.)
| | | | | |
Collapse
|
8
|
Bossert S, Pauly A, Danforth BN, Orr MC, Murray EA. Lessons from assembling UCEs: A comparison of common methods and the case of Clavinomia (Halictidae). Mol Ecol Resour 2024; 24:e13925. [PMID: 38183389 DOI: 10.1111/1755-0998.13925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 12/08/2023] [Accepted: 12/21/2023] [Indexed: 01/08/2024]
Abstract
Sequence data assembly is a foundational step in high-throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under-studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE-only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided-assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally move Clavinomia to Dieunomiini and render Epinomia once more a subgenus of Dieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better-performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.
Collapse
Affiliation(s)
- Silas Bossert
- Department of Entomology, Washington State University, Pullman, Washington, USA
- Department of Entomology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Alain Pauly
- Royal Belgian Institute of Natural Sciences, O.D. Taxonomy and Phylogeny, Brussels, Belgium
| | - Bryan N Danforth
- Department of Entomology, Cornell University, Ithaca, New York, USA
| | - Michael C Orr
- Entomologie, Staatliches Museum für Naturkunde Stuttgart, Stuttgart, Germany
| | - Elizabeth A Murray
- Department of Entomology, Washington State University, Pullman, Washington, USA
| |
Collapse
|
9
|
Melum VJ, Sáenz de Miera C, Markussen FAF, Cázarez-Márquez F, Jaeger C, Sandve SR, Simonneaux V, Hazlerigg DG, Wood SH. Hypothalamic tanycytes as mediators of maternally programmed seasonal plasticity. Curr Biol 2024; 34:632-640.e6. [PMID: 38218183 DOI: 10.1016/j.cub.2023.12.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 11/07/2023] [Accepted: 12/13/2023] [Indexed: 01/15/2024]
Abstract
In mammals, maternal photoperiodic programming (MPP) provides a means whereby juvenile development can be matched to forthcoming seasonal environmental conditions.1,2,3,4 This phenomenon is driven by in utero effects of maternal melatonin5,6,7 on the production of thyrotropin (TSH) in the fetal pars tuberalis (PT) and consequent TSH receptor-mediated effects on tanycytes lining the 3rd ventricle of the mediobasal hypothalamus (MBH).8,9,10 Here we use LASER capture microdissection and transcriptomic profiling to show that TSH-dependent MPP controls the attributes of the ependymal region of the MBH in juvenile animals. In Siberian hamster pups gestated and raised on a long photoperiod (LP) and thereby committed to a fast trajectory for growth and reproductive maturation, the ependymal region is enriched for tanycytes bearing sensory cilia and receptors implicated in metabolic sensing. Contrastingly, in pups gestated and raised on short photoperiod (SP) and therefore following an over-wintering developmental trajectory with delayed sexual maturation, the ependymal region has fewer sensory tanycytes. Post-weaning transfer of SP-gestated pups to an intermediate photoperiod (IP), which accelerates reproductive maturation, results in a pronounced shift toward a ciliated tanycytic profile and formation of tanycytic processes. We suggest that tanycytic plasticity constitutes a mechanism to tailor metabolic development for extended survival in variable overwintering environments.
Collapse
Affiliation(s)
- Vebjørn J Melum
- Arctic seasonal timekeeping initiative (ASTI), UiT-The Arctic University of Norway, Department of Arctic and Marine Biology, Arctic Chronobiology and Physiology Research Group, NO-9037 Tromsø, Norway; University of Strasbourg, Institute of Cellular and Integrative Neurosciences, Strasbourg 67000, France
| | - Cristina Sáenz de Miera
- University of Michigan Medical School, Department of Molecular and Integrative Physiology, Ann Arbor, MI 48109, USA
| | - Fredrik A F Markussen
- Arctic seasonal timekeeping initiative (ASTI), UiT-The Arctic University of Norway, Department of Arctic and Marine Biology, Arctic Chronobiology and Physiology Research Group, NO-9037 Tromsø, Norway
| | - Fernando Cázarez-Márquez
- Arctic seasonal timekeeping initiative (ASTI), UiT-The Arctic University of Norway, Department of Arctic and Marine Biology, Arctic Chronobiology and Physiology Research Group, NO-9037 Tromsø, Norway
| | - Catherine Jaeger
- University of Strasbourg, Institute of Cellular and Integrative Neurosciences, Strasbourg 67000, France
| | - Simen R Sandve
- Faculty of Biosciences, Norwegian University of Life Sciences (NMBU), NO-1432 Ås, Norway
| | - Valérie Simonneaux
- University of Strasbourg, Institute of Cellular and Integrative Neurosciences, Strasbourg 67000, France
| | - David G Hazlerigg
- Arctic seasonal timekeeping initiative (ASTI), UiT-The Arctic University of Norway, Department of Arctic and Marine Biology, Arctic Chronobiology and Physiology Research Group, NO-9037 Tromsø, Norway.
| | - Shona H Wood
- Arctic seasonal timekeeping initiative (ASTI), UiT-The Arctic University of Norway, Department of Arctic and Marine Biology, Arctic Chronobiology and Physiology Research Group, NO-9037 Tromsø, Norway.
| |
Collapse
|
10
|
Schiebelhut LM, Guillaume AS, Kuhn A, Schweizer RM, Armstrong EE, Beaumont MA, Byrne M, Cosart T, Hand BK, Howard L, Mussmann SM, Narum SR, Rasteiro R, Rivera-Colón AG, Saarman N, Sethuraman A, Taylor HR, Thomas GWC, Wellenreuther M, Luikart G. Genomics and conservation: Guidance from training to analyses and applications. Mol Ecol Resour 2024; 24:e13893. [PMID: 37966259 DOI: 10.1111/1755-0998.13893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/16/2023]
Abstract
Environmental change is intensifying the biodiversity crisis and threatening species across the tree of life. Conservation genomics can help inform conservation actions and slow biodiversity loss. However, more training, appropriate use of novel genomic methods and communication with managers are needed. Here, we review practical guidance to improve applied conservation genomics. We share insights aimed at ensuring effectiveness of conservation actions around three themes: (1) improving pedagogy and training in conservation genomics including for online global audiences, (2) conducting rigorous population genomic analyses properly considering theory, marker types and data interpretation and (3) facilitating communication and collaboration between managers and researchers. We aim to update students and professionals and expand their conservation toolkit with genomic principles and recent approaches for conserving and managing biodiversity. The biodiversity crisis is a global problem and, as such, requires international involvement, training, collaboration and frequent reviews of the literature and workshops as we do here.
Collapse
Affiliation(s)
- Lauren M Schiebelhut
- Life and Environmental Sciences, University of California, Merced, California, USA
| | - Annie S Guillaume
- Geospatial Molecular Epidemiology group (GEOME), Laboratory for Biological Geochemistry (LGB), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Arianna Kuhn
- Department of Biological Sciences, University of Lethbridge, Lethbridge, Alberta, Canada
- Virginia Museum of Natural History, Martinsville, Virginia, USA
| | - Rena M Schweizer
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
| | | | - Mark A Beaumont
- School of Biological Sciences, University of Bristol, Bristol, UK
| | - Margaret Byrne
- Department of Biodiversity, Conservation and Attractions, Biodiversity and Conservation Science, Perth, Western Australia, Australia
| | - Ted Cosart
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Brian K Hand
- Flathead Lake Biological Station, University of Montana, Polson, Montana, USA
| | - Leif Howard
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| | - Steven M Mussmann
- Southwestern Native Aquatic Resources and Recovery Center, U.S. Fish & Wildlife Service, Dexter, New Mexico, USA
| | - Shawn R Narum
- Hagerman Genetics Lab, University of Idaho, Hagerman, Idaho, USA
| | - Rita Rasteiro
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Angel G Rivera-Colón
- Department of Evolution, Ecology, and Behavior, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
| | - Norah Saarman
- Department of Biology and Ecology Center, Utah State University, Logan, Utah, USA
| | - Arun Sethuraman
- Department of Biology, San Diego State University, San Diego, California, USA
| | - Helen R Taylor
- Royal Zoological Society of Scotland, Edinburgh, Scotland
| | - Gregg W C Thomas
- Informatics Group, Harvard University, Cambridge, Massachusetts, USA
| | - Maren Wellenreuther
- Plant and Food Research, Nelson, New Zealand
- University of Auckland, Auckland, New Zealand
| | - Gordon Luikart
- Division of Biological Sciences, University of Montana, Missoula, Montana, USA
- Flathead Lake Biology Station, University of Montana, Missoula, Montana, USA
| |
Collapse
|
11
|
Han FY, Wu RX, Miao BB, Niu SF, Wang QH, Liang ZB. Whole-Genome Sequencing Analyses Reveal the Whip-like Tail Formation, Innate Immune Evolution, and DNA Repair Mechanisms of Eupleurogrammus muticus. Animals (Basel) 2024; 14:434. [PMID: 38338077 PMCID: PMC10854985 DOI: 10.3390/ani14030434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/19/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open
Abstract
Smallhead hairtail (Eupleurogrammus muticus) is an important marine economic fish distributed along the northern Indian Ocean and the northwest Pacific coast; however, little is known about the mechanism of its genetic evolution. This study generated the first genome assembly of E. muticus at the chromosomal level using a combination of PacBio SMRT, Illumina Nova-Seq, and Hi-C technologies. The final assembled genome size was 709.27 Mb, with a contig N50 of 25.07 Mb, GC content of 40.81%, heterozygosity rate of 1.18%, and repetitive sequence rate of 35.43%. E. muticus genome contained 21,949 protein-coding genes (97.92% of the genes were functionally annotated) and 24 chromosomes. There were 143 expansion gene families, 708 contraction gene families, and 4888 positively selected genes in the genome. Based on the comparative genomic analyses, we screened several candidate genes and pathways related to whip-like tail formation, innate immunity, and DNA repair in E. muticus. These findings preliminarily reveal some molecular evolutionary mechanisms of E. muticus at the genomic level and provide important reference genomic data for the genetic studies of other trichiurids.
Collapse
Affiliation(s)
- Fang-Yuan Han
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China; (F.-Y.H.); (S.-F.N.); (Z.-B.L.)
| | - Ren-Xie Wu
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China; (F.-Y.H.); (S.-F.N.); (Z.-B.L.)
| | - Ben-Ben Miao
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen 361102, China;
| | - Su-Fang Niu
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China; (F.-Y.H.); (S.-F.N.); (Z.-B.L.)
| | - Qing-Hua Wang
- State Key Laboratory of Biocontrol, Institute of Aquatic Economic Animals and Guangdong Provincial Key Laboratory for Aquatic Economic Animals, Life Sciences School, Sun Yat-sen University, Guangzhou 510275, China;
| | - Zhen-Bang Liang
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China; (F.-Y.H.); (S.-F.N.); (Z.-B.L.)
| |
Collapse
|
12
|
Rádai Z, Váradi A, Takács P, Nagy NA, Schmitt N, Prépost E, Kardos G, Laczkó L. An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies. BMC Genomics 2024; 25:45. [PMID: 38195441 PMCID: PMC10777565 DOI: 10.1186/s12864-023-09910-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 12/15/2023] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND Parameters adversely affecting the contiguity and accuracy of the assemblies from Illumina next-generation sequencing (NGS) are well described. However, past studies generally focused on their additive effects, overlooking their potential interactions possibly exacerbating one another's effects in a multiplicative manner. To investigate whether or not they act interactively on de novo genome assembly quality, we simulated sequencing data for 13 bacterial reference genomes, with varying levels of error rate, sequencing depth, PCR and optical duplicate ratios. RESULTS We assessed the quality of assemblies from the simulated sequencing data with a number of contiguity and accuracy metrics, which we used to quantify both additive and multiplicative effects of the four parameters. We found that the tested parameters are engaged in complex interactions, exerting multiplicative, rather than additive, effects on assembly quality. Also, the ratio of non-repeated regions and GC% of the original genomes can shape how the four parameters affect assembly quality. CONCLUSIONS We provide a framework for consideration in future studies using de novo genome assembly of bacterial genomes, e.g. in choosing the optimal sequencing depth, balancing between its positive effect on contiguity and negative effect on accuracy due to its interaction with error rate. Furthermore, the properties of the genomes to be sequenced also should be taken into account, as they might influence the effects of error sources themselves.
Collapse
Affiliation(s)
- Zoltán Rádai
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary.
- Department of Dermatology, University Hospital Düsseldorf, Heinrich-Heine-University, Düsseldorf, Germany.
| | - Alex Váradi
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Laboratory Medicine, Medical School, University of Pécs, Pécs, Hungary
| | - Péter Takács
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Health Informatics, Institute of Health Sciences, Faculty of Health, University of Debrecen, Debrecen, Hungary
| | - Nikoletta Andrea Nagy
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Evolutionary Zoology, ELKH-DE Behavioural Ecology Research Group, University of Debrecen, Debrecen, Hungary
- Department of Evolutionary Zoology and Human Biology, University of Debrecen, Debrecen, Hungary
| | - Nicholas Schmitt
- Department of Dermatology, University Hospital Düsseldorf, Heinrich-Heine-University, Düsseldorf, Germany
| | - Eszter Prépost
- Department of Health Industry, University of Debrecen, Debrecen, Hungary
| | - Gábor Kardos
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Gerontology, Faculty of Health Sciences, University of Debrecen, Debrecen, Hungary
| | - Levente Laczkó
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
- ELKH-DE Conservation Biology Research Group, Debrecen, Hungary
| |
Collapse
|
13
|
Parvin N, Mandal S, Rath J. Microbiome of seventh-century old Parsurameswara stone monument of India and role of desiccation-tolerant cyanobacterium Lyngbya corticicola on its biodeterioration. BIOFOULING 2024; 40:40-53. [PMID: 38359904 DOI: 10.1080/08927014.2024.2305381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/08/2024] [Indexed: 02/17/2024]
Abstract
The Parsurameswara stone monument, built in the seventh century, is one of the oldest stone monuments in Odisha, India. Metagenomic analysis of the biological crust samples collected from the stone monument revealed 17 phyla in the microbiome, with Proteobacteria being the most dominant phylum, followed by cyanobacteria. Eight cyanobacteria were isolated. Lyngbya corticicola was the dominant cyanobacterium in all crust samples and could tolerate six months of desiccation in vitro. With six months of desiccation, chlorophyll-a decreased; however, carotenoid and cellular carbohydrate contents of this organism increased in the desiccated state. Resistance to desiccation, high carotenoid content, and effective trehalose biosynthesis in this cyanobacterium provide a distinct advantage over other microbiomes. Comparative metabolic profiles of the biological crust and L. corticicola show strongly corrosive organic acids such as dichloroacetic acid, which might be responsible for the biocorrosion of stone monuments.
Collapse
Affiliation(s)
- Nousi Parvin
- Department of Botany, Visva-Bharati (A Central University), Santiniketan, West Bengal, India
| | - Sikha Mandal
- Department of Botany, Sree Chaitanya College, Habra, West Bengal, India
| | - Jnanendra Rath
- Department of Botany, Visva-Bharati (A Central University), Santiniketan, West Bengal, India
| |
Collapse
|
14
|
Wu RX, Miao BB, Han FY, Niu SF, Liang YS, Liang ZB, Wang QH. Chromosome-Level Genome Assembly Provides Insights into the Evolution of the Special Morphology and Behaviour of Lepturacanthus savala. Genes (Basel) 2023; 14:1268. [PMID: 37372448 DOI: 10.3390/genes14061268] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/13/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
Savalani hairtail Lepturacanthus savala is a widely distributed fish along the Indo-Western Pacific coast, and contributes substantially to trichiurid fishery resources worldwide. In this study, the first chromosome-level genome assembly of L. savala was obtained by PacBio SMRT-Seq, Illumina HiSeq, and Hi-C technologies. The final assembled L. savala genome was 790.02 Mb with contig N50 and scaffold N50 values of 19.01 Mb and 32.77 Mb, respectively. The assembled sequences were anchored to 24 chromosomes by using Hi-C data. Combined with RNA sequencing data, 23,625 protein-coding genes were predicted, of which 96.0% were successfully annotated. In total, 67 gene family expansions and 93 gene family contractions were detected in the L. savala genome. Additionally, 1825 positively selected genes were identified. Based on a comparative genomic analysis, we screened a number of candidate genes associated with the specific morphology, behaviour-related immune system, and DNA repair mechanisms in L. savala. Our results preliminarily revealed mechanisms underlying the special morphological and behavioural characteristics of L. savala from a genomic perspective. Furthermore, this study provides valuable reference data for subsequent molecular ecology studies of L. savala and whole-genome analyses of other trichiurid fishes.
Collapse
Affiliation(s)
- Ren-Xie Wu
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| | - Ben-Ben Miao
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| | - Fang-Yuan Han
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| | - Su-Fang Niu
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| | - Yan-Shan Liang
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| | - Zhen-Bang Liang
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| | - Qing-Hua Wang
- College of Fisheries, Guangdong Ocean University, Zhanjiang 524088, China
| |
Collapse
|
15
|
Deppisch P, Kirsch V, Helfrich-Förster C, Senthilan PR. Contribution of cryptochromes and photolyases for insect life under sunlight. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:373-389. [PMID: 36609567 PMCID: PMC10102093 DOI: 10.1007/s00359-022-01607-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/19/2022] [Accepted: 12/20/2022] [Indexed: 01/07/2023]
Abstract
The cryptochrome/photolyase (CRY/PL) family is essential for life under sunlight because photolyases repair UV-damaged DNA and cryptochromes are normally part of the circadian clock that controls the activity-sleep cycle within the 24-h day. In this study, we aim to understand how the lineage and habitat of an insect affects its CRY/PL composition. To this end, we searched the large number of annotated protein sequences of 340 insect species already available in databases for CRY/PLs. Using phylogenetic tree and motif analyses, we identified four frequent CRY/PLs in insects: the photolyases 6-4 PL and CPDII PL, as well as the mammalian-type cryptochrome (MCRY) and Drosophila-type cryptochrome (DCRY). Assignment of CRY/PLs to the corresponding insects confirmed that light-exposed insects tend to have more CRY/PLs than insects with little light exposure. Nevertheless, even insects with greatly reduced CRY/PLs still possess MCRY, which can be regarded as the major insect cryptochrome. Only flies of the genus Schizophora, which includes Drosophila melanogaster, lost MCRY. Moreover, we found that MCRY and CPDII PL as well as DCRY and 6-4 PL occur very frequently together, suggesting an interaction between the two pairs.
Collapse
Affiliation(s)
- Peter Deppisch
- Neurobiology and Genetics, Theodor-Boveri Institute, Biocenter, Julius-Maximilians-University Würzburg, 97074, Würzburg, Germany
| | - Valentina Kirsch
- Neurobiology and Genetics, Theodor-Boveri Institute, Biocenter, Julius-Maximilians-University Würzburg, 97074, Würzburg, Germany
| | - Charlotte Helfrich-Förster
- Neurobiology and Genetics, Theodor-Boveri Institute, Biocenter, Julius-Maximilians-University Würzburg, 97074, Würzburg, Germany
| | - Pingkalai R Senthilan
- Neurobiology and Genetics, Theodor-Boveri Institute, Biocenter, Julius-Maximilians-University Würzburg, 97074, Würzburg, Germany.
| |
Collapse
|
16
|
Park S, Lee J, Kim J, Kim D, Lee JH, Pack SP, Seo M. Benchmark study for evaluating the quality of reference genomes and gene annotations in 114 species. Front Vet Sci 2023; 10:1128570. [PMID: 36896291 PMCID: PMC9988948 DOI: 10.3389/fvets.2023.1128570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/02/2023] [Indexed: 02/23/2023] Open
Abstract
Introduction For reference genomes and gene annotations are key materials that can determine the limits of the molecular biology research of a species; however, systematic research on their quality assessment remains insufficient. Methods We collected reference assemblies, gene annotations, and 3,420 RNA-sequencing (RNA-seq) data from 114 species and selected effective indicators to simultaneously evaluate the reference genome quality of various species, including statistics that can be obtained empirically during the mapping process of short reads. Furthermore, we newly presented and applied transcript diversity and quantification success rates that can relatively evaluate the quality of gene annotations of various species. Finally, we proposed a next-generation sequencing (NGS) applicability index by integrating a total of 10 effective indicators that can evaluate the genome and gene annotation of a specific species. Results and discussion Based on these effective evaluation indicators, we successfully evaluated and demonstrated the relative accessibility of NGS applications in all species, which will directly contribute to determining the technological boundaries in each species. Simultaneously, we expect that it will be a key indicator to examine the direction of future development through relative quality evaluation of genomes and gene annotations in each species, including countless organisms whose genomes and gene annotations will be constructed in the future.
Collapse
Affiliation(s)
- Sinwoo Park
- Department of Computer and Information Science, Korea University, Sejong City, Republic of Korea
| | - Jinbaek Lee
- Department of Computer Convergence Software, Korea University, Sejong City, Republic of Korea
| | - Jaeryeong Kim
- Department of Computer and Information Science, Korea University, Sejong City, Republic of Korea
| | - Dohyeon Kim
- Department of Computer and Information Science, Korea University, Sejong City, Republic of Korea
| | - Jin Hyup Lee
- Department of Food and Biotechnology, Korea University, Sejong City, Republic of Korea
| | - Seung Pil Pack
- Department of Biotechnology and Bioinformatics, Korea University, Sejong City, Republic of Korea
| | - Minseok Seo
- Department of Computer and Information Science, Korea University, Sejong City, Republic of Korea.,Department of Computer Convergence Software, Korea University, Sejong City, Republic of Korea
| |
Collapse
|
17
|
Cosma BM, Shirali Hossein Zade R, Jordan EN, van Lent P, Peng C, Pillay S, Abeel T. Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations. Gigascience 2022; 12:giad100. [PMID: 38000912 PMCID: PMC10673639 DOI: 10.1093/gigascience/giad100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 06/18/2023] [Accepted: 10/31/2023] [Indexed: 11/26/2023] Open
Abstract
BACKGROUND Assembly algorithm choice should be a deliberate, well-justified decision when researchers create genome assemblies for eukaryotic organisms from third-generation sequencing technologies. While third-generation sequencing by Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) has overcome the disadvantages of short read lengths specific to next-generation sequencing (NGS), third-generation sequencers are known to produce more error-prone reads, thereby generating a new set of challenges for assembly algorithms and pipelines. However, the introduction of HiFi reads, which offer substantially reduced error rates, has provided a promising solution for more accurate assembly outcomes. Since the introduction of third-generation sequencing technologies, many tools have been developed that aim to take advantage of the longer reads, and researchers need to choose the correct assembler for their projects. RESULTS We benchmarked state-of-the-art long-read de novo assemblers to help readers make a balanced choice for the assembly of eukaryotes. To this end, we used 12 real and 64 simulated datasets from different eukaryotic genomes, with different read length distributions, imitating PacBio continuous long-read (CLR), PacBio high-fidelity (HiFi), and ONT sequencing to evaluate the assemblers. We include 5 commonly used long-read assemblers in our benchmark: Canu, Flye, Miniasm, Raven, and wtdbg2 for ONT and PacBio CLR reads. For PacBio HiFi reads , we include 5 state-of-the-art HiFi assemblers: HiCanu, Flye, Hifiasm, LJA, and MBG. Evaluation categories address the following metrics: reference-based metrics, assembly statistics, misassembly count, BUSCO completeness, runtime, and RAM usage. Additionally, we investigated the effect of increased read length on the quality of the assemblies and report that read length can, but does not always, positively impact assembly quality. CONCLUSIONS Our benchmark concludes that there is no assembler that performs the best in all the evaluation categories. However, our results show that overall Flye is the best-performing assembler for PacBio CLR and ONT reads, both on real and simulated data. Meanwhile, best-performing PacBio HiFi assemblers are Hifiasm and LJA. Next, the benchmarking using longer reads shows that the increased read length improves assembly quality, but the extent to which that can be achieved depends on the size and complexity of the reference genome.
Collapse
Affiliation(s)
- Bianca-Maria Cosma
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
| | - Ramin Shirali Hossein Zade
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
| | - Erin Noel Jordan
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
- Technical Biochemistry, TU Dortmund University, 44227, Dortmund, Germany
| | - Paul van Lent
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
| | - Chengyao Peng
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
| | - Stephanie Pillay
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
| | - Thomas Abeel
- Delft Bioinformatics Lab, Intelligent Systems, Delft University of Technology, 2628 XE, Delft, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
18
|
Deppisch P, Helfrich-Förster C, Senthilan PR. The Gain and Loss of Cryptochrome/Photolyase Family Members during Evolution. Genes (Basel) 2022; 13:1613. [PMID: 36140781 PMCID: PMC9498864 DOI: 10.3390/genes13091613] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 09/02/2022] [Accepted: 09/05/2022] [Indexed: 11/20/2022] Open
Abstract
The cryptochrome/photolyase (CRY/PL) family represents an ancient group of proteins fulfilling two fundamental functions. While photolyases repair UV-induced DNA damages, cryptochromes mainly influence the circadian clock. In this study, we took advantage of the large number of already sequenced and annotated genes available in databases and systematically searched for the protein sequences of CRY/PL family members in all taxonomic groups primarily focusing on metazoans and limiting the number of species per taxonomic order to five. Using BLASTP searches and subsequent phylogenetic tree and motif analyses, we identified five distinct photolyases (CPDI, CPDII, CPDIII, 6-4 photolyase, and the plant photolyase PPL) and six cryptochrome subfamilies (DASH-CRY, mammalian-type MCRY, Drosophila-type DCRY, cnidarian-specific ACRY, plant-specific PCRY, and the putative magnetoreceptor CRY4. Manually assigning the CRY/PL subfamilies to the species studied, we have noted that over evolutionary history, an initial increase of various CRY/PL subfamilies was followed by a decrease and specialization. Thus, in more primitive organisms (e.g., bacteria, archaea, simple eukaryotes, and in basal metazoans), we find relatively few CRY/PL members. As species become more evolved (e.g., cnidarians, mollusks, echinoderms, etc.), the CRY/PL repertoire also increases, whereas it appears to decrease again in more recent organisms (humans, fruit flies, etc.). Moreover, our study indicates that all cryptochromes, although largely active in the circadian clock, arose independently from different photolyases, explaining their different modes of action.
Collapse
Affiliation(s)
| | | | - Pingkalai R. Senthilan
- Neurobiology & Genetics, Theodor-Boveri Institute, Biocenter, Julius-Maximilians-University Würzburg, 97074 Wurzburg, Germany
| |
Collapse
|
19
|
Costábile A, Castellano M, Aversa-Marnai M, Quartiani I, Conijeski D, Perretta A, Villarino A, Silva-Álvarez V, Ferreira AM. A different transcriptional landscape sheds light on Russian sturgeon (Acipenser gueldenstaedtii) mechanisms to cope with bacterial infection and chronic heat stress. FISH & SHELLFISH IMMUNOLOGY 2022; 128:505-522. [PMID: 35985628 DOI: 10.1016/j.fsi.2022.08.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 06/15/2023]
Abstract
Sturgeons are chondrostean fish of high economic value and critically endangered due to anthropogenic activities, which has led to sturgeon aquaculture development. Russian sturgeon (Acipenser gueldenstaedtii), the second most important species reared for caviar, is successfully farmed in subtropical countries, including Uruguay. However, during the Uruguayan summer, sturgeons face intolerable warmer temperatures that weaken their defences and favour infections by opportunistic pathogens, increasing fish mortality and farm economic losses. Since innate immunity is paramount in fish, for which the liver plays a key role, we used deep RNA sequencing to analyse differentially expressed genes in the liver of Russian sturgeons exposed to chronic heat stress and challenged with Aeromonas hydrophila. We assembled 149.615 unigenes in the Russian sturgeon liver transcriptome and found that metabolism and immune defence pathways are among the top five biological processes taking place in the liver. Chronic heat stress provoked profound effects on liver biological functions, up-regulating genes related to protein folding, heat shock response and lipid and protein metabolism to meet energy demands for coping with heat stress. Besides, long-term exposure to heat stress led to cell damage triggering liver inflammation and diminishing liver ability to mount an innate response to A. hydrophila challenge. Accordingly, the reprogramming of liver metabolism over an extended period had detrimental effects on fish health, resulting in weight loss and mortality, with the latter increasing after A. hydrophila challenge. To our knowledge, this is the first transcriptomic study describing how chronic heat-stressed sturgeons respond to a bacterial challenge, suggesting that liver metabolism alterations have a negative impact on the innate anti-bacterial response.
Collapse
Affiliation(s)
- Alicia Costábile
- Sección Bioquímica y Biología Molecular, Facultad de Ciencias, Universidad de la República, CP 11400, Montevideo, Uruguay
| | - Mauricio Castellano
- Unidad de Inmunología, Instituto de Química Biológica, Facultad de Ciencias, Universidad de la República, CP 11600, Montevideo, Uruguay; Área Inmunología, Departamento de Biociencias, Facultad de Química, Universidad de la República, Montevideo, CP 11600, Montevideo, Uruguay; Sección Bioquímica y Biología Molecular, Facultad de Ciencias, Universidad de la República, CP 11400, Montevideo, Uruguay
| | - Marcio Aversa-Marnai
- Unidad de Inmunología, Instituto de Química Biológica, Facultad de Ciencias, Universidad de la República, CP 11600, Montevideo, Uruguay; Área Inmunología, Departamento de Biociencias, Facultad de Química, Universidad de la República, Montevideo, CP 11600, Montevideo, Uruguay
| | - Ignacio Quartiani
- Unidad de Patología, Biología y Cultivo de Organismos Acuáticos, Departamento de Ciencia y Tecnología de los Alimentos, Facultad de Veterinaria, Universidad de la República, CP 11300, Montevideo, Uruguay
| | | | - Alejandro Perretta
- Unidad de Patología, Biología y Cultivo de Organismos Acuáticos, Departamento de Ciencia y Tecnología de los Alimentos, Facultad de Veterinaria, Universidad de la República, CP 11300, Montevideo, Uruguay
| | - Andrea Villarino
- Sección Bioquímica y Biología Molecular, Facultad de Ciencias, Universidad de la República, CP 11400, Montevideo, Uruguay
| | - Valeria Silva-Álvarez
- Unidad de Inmunología, Instituto de Química Biológica, Facultad de Ciencias, Universidad de la República, CP 11600, Montevideo, Uruguay; Área Inmunología, Departamento de Biociencias, Facultad de Química, Universidad de la República, Montevideo, CP 11600, Montevideo, Uruguay.
| | - Ana María Ferreira
- Unidad de Inmunología, Instituto de Química Biológica, Facultad de Ciencias, Universidad de la República, CP 11600, Montevideo, Uruguay; Área Inmunología, Departamento de Biociencias, Facultad de Química, Universidad de la República, Montevideo, CP 11600, Montevideo, Uruguay.
| |
Collapse
|
20
|
Pickett BD, Glass JR, Johnson TP, Ridge PG, Kauwe JSK. The genome of a giant (trevally): Caranx ignobilis. GIGABYTE 2022; 2022:gigabyte67. [PMID: 36824527 PMCID: PMC9694125 DOI: 10.46471/gigabyte.67] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 08/25/2022] [Indexed: 11/09/2022] Open
Abstract
Caranx ignobilis, commonly known as giant kingfish or giant trevally, is a large, reef-associated apex predator. It is a prized sportfish, targeted throughout its tropical and subtropical range in the Indian and Pacific Oceans. It also gained significant interest in aquaculture due to its unusual freshwater tolerance. Here, we present a draft assembly of the estimated 625.92 Mbp nuclear genome of a C. ignobilis individual from Hawaiian waters, which host a genetically distinct population. Our 97.4% BUSCO-complete assembly has a contig NG50 of 7.3 Mbp and a scaffold NG50 of 46.3 Mbp. Twenty-five of the 203 scaffolds contain 90% of the genome. We also present noisy, long-read DNA, Hi-C, and RNA-seq datasets, the latter containing eight distinct tissues and can help with annotations and studies of freshwater tolerance. Our genome assembly and its supporting data are valuable tools for ecological and comparative genomics studies of kingfishes and other carangoid fishes.
Collapse
Affiliation(s)
| | - Jessica R. Glass
- South African Institute for Aquatic Biodiversity, Makhanda, South Africa
- College of Fisheries and Ocean Sciences, University of Alaska Fairbanks, Fairbanks, Alaska, USA
| | | | - Perry G. Ridge
- Department of Biology, Brigham Young University, Provo, Utah, USA
| | - John S. K. Kauwe
- Department of Biology, Brigham Young University, Provo, Utah, USA
- Brigham Young University – Hawai‘i, Laie, Hawai‘i, USA
| |
Collapse
|
21
|
Comparative Genome Analyses of Plant Rust Pathogen Genomes Reveal a Confluence of Pathogenicity Factors to Quell Host Plant Defense Responses. PLANTS 2022; 11:plants11151962. [PMID: 35956440 PMCID: PMC9370660 DOI: 10.3390/plants11151962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 07/18/2022] [Accepted: 07/20/2022] [Indexed: 12/05/2022]
Abstract
Switchgrass rust caused by Puccinia novopanici (P. novopanici) has the ability to significantly affect the biomass yield of switchgrass, an important biofuel crop in the United States. A comparative genome analysis of P. novopanici with rust pathogen genomes infecting monocot cereal crops wheat, barley, oats, maize and sorghum revealed the presence of larger structural variations contributing to their genome sizes. A comparative alignment of the rust pathogen genomes resulted in the identification of collinear and syntenic relationships between P. novopanici and P. sorghi; P. graminis tritici 21–0 (Pgt 21) and P. graminis tritici Ug99 (Pgt Ug99) and between Pgt 21 and P. triticina (Pt). Repeat element analysis indicated a strong presence of retro elements among different Puccinia genomes, contributing to the genome size variation between ~1 and 3%. A comparative look at the enriched protein families of Puccinia spp. revealed a predominant role of restriction of telomere capping proteins (RTC), disulfide isomerases, polysaccharide deacetylases, glycoside hydrolases, superoxide dismutases and multi-copper oxidases (MCOs). All the proteomes of Puccinia spp. share in common a repertoire of 75 secretory and 24 effector proteins, including glycoside hydrolases cellobiohydrolases, peptidyl-propyl isomerases, polysaccharide deacetylases and protein disulfide-isomerases, that remain central to their pathogenicity. Comparison of the predicted effector proteins from Puccinia spp. genomes to the validated proteins from the Pathogen–Host Interactions database (PHI-base) resulted in the identification of validated effector proteins PgtSR1 (PGTG_09586) from P. graminis and Mlp124478 from Melampsora laricis across all the rust pathogen genomes.
Collapse
|
22
|
Giorgashvili E, Reichel K, Caswara C, Kerimov V, Borsch T, Gruenstaeudl M. Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic Calligonum bakuense. FRONTIERS IN PLANT SCIENCE 2022; 13:779830. [PMID: 35874012 PMCID: PMC9296850 DOI: 10.3389/fpls.2022.779830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
Collapse
Affiliation(s)
- Eka Giorgashvili
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Katja Reichel
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Calvinna Caswara
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| | - Vuqar Kerimov
- Institute of Botany, Azerbaijan National Academy of Sciences (ANAS), Baku, Azerbaijan
| | - Thomas Borsch
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
- Botanischer Garten und Botanisches Museum Berlin, Freie Universität Berlin, Berlin, Germany
| | - Michael Gruenstaeudl
- Systematische Botanik und Pflanzengeographie, Institut für Biologie, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
23
|
Gupta AK, Kumar M. Benchmarking and Assessment of Eight De Novo Genome Assemblers on Viral Next-Generation Sequencing Data, Including the SARS-CoV-2. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:372-381. [PMID: 35759429 DOI: 10.1089/omi.2022.0042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight de novo assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative De Bruijn graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight de novo genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.
Collapse
Affiliation(s)
- Amit Kumar Gupta
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
| | - Manoj Kumar
- Virology Unit and Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Chandigarh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
24
|
A combined de novo assembly approach increases the quality of prokaryotic draft genomes. Folia Microbiol (Praha) 2022; 67:801-810. [DOI: 10.1007/s12223-022-00980-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 05/24/2022] [Indexed: 11/04/2022]
|
25
|
von Reumont BM, Anderluh G, Antunes A, Ayvazyan N, Beis D, Caliskan F, Crnković A, Damm M, Dutertre S, Ellgaard L, Gajski G, German H, Halassy B, Hempel BF, Hucho T, Igci N, Ikonomopoulou MP, Karbat I, Klapa MI, Koludarov I, Kool J, Lüddecke T, Ben Mansour R, Vittoria Modica M, Moran Y, Nalbantsoy A, Ibáñez MEP, Panagiotopoulos A, Reuveny E, Céspedes JS, Sombke A, Surm JM, Undheim EAB, Verdes A, Zancolli G. Modern venomics-Current insights, novel methods, and future perspectives in biological and applied animal venom research. Gigascience 2022; 11:giac048. [PMID: 35640874 PMCID: PMC9155608 DOI: 10.1093/gigascience/giac048] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 04/10/2022] [Accepted: 04/12/2022] [Indexed: 12/11/2022] Open
Abstract
Venoms have evolved >100 times in all major animal groups, and their components, known as toxins, have been fine-tuned over millions of years into highly effective biochemical weapons. There are many outstanding questions on the evolution of toxin arsenals, such as how venom genes originate, how venom contributes to the fitness of venomous species, and which modifications at the genomic, transcriptomic, and protein level drive their evolution. These questions have received particularly little attention outside of snakes, cone snails, spiders, and scorpions. Venom compounds have further become a source of inspiration for translational research using their diverse bioactivities for various applications. We highlight here recent advances and new strategies in modern venomics and discuss how recent technological innovations and multi-omic methods dramatically improve research on venomous animals. The study of genomes and their modifications through CRISPR and knockdown technologies will increase our understanding of how toxins evolve and which functions they have in the different ontogenetic stages during the development of venomous animals. Mass spectrometry imaging combined with spatial transcriptomics, in situ hybridization techniques, and modern computer tomography gives us further insights into the spatial distribution of toxins in the venom system and the function of the venom apparatus. All these evolutionary and biological insights contribute to more efficiently identify venom compounds, which can then be synthesized or produced in adapted expression systems to test their bioactivity. Finally, we critically discuss recent agrochemical, pharmaceutical, therapeutic, and diagnostic (so-called translational) aspects of venoms from which humans benefit.
Collapse
Affiliation(s)
- Bjoern M von Reumont
- Goethe University Frankfurt, Institute for Cell Biology and Neuroscience, Department for Applied Bioinformatics, 60438 Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Frankfurt, Senckenberganlage 25, 60235 Frankfurt, Germany
- Justus Liebig University Giessen, Institute for Insectbiotechnology, Heinrich Buff Ring 26-32, 35396 Giessen, Germany
| | - Gregor Anderluh
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450–208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Naira Ayvazyan
- Orbeli Institute of Physiology of NAS RA, Orbeli ave. 22, 0028 Yerevan, Armenia
| | - Dimitris Beis
- Developmental Biology, Centre for Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation Academy of Athens, Athens 11527, Greece
| | - Figen Caliskan
- Department of Biology, Faculty of Science and Letters, Eskisehir Osmangazi University, TR-26040 Eskisehir, Turkey
| | - Ana Crnković
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Maik Damm
- Technische Universität Berlin, Department of Chemistry, Straße des 17. Juni 135, 10623 Berlin, Germany
| | | | - Lars Ellgaard
- Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Goran Gajski
- Institute for Medical Research and Occupational Health, Mutagenesis Unit, Ksaverska cesta 2, 10000 Zagreb, Croatia
| | - Hannah German
- Amsterdam Institute of Molecular and Life Sciences, Division of BioAnalytical Chemistry, Faculty of Science, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081HV Amsterdam, The Netherlands
| | - Beata Halassy
- University of Zagreb, Centre for Research and Knowledge Transfer in Biotechnology, Trg Republike Hrvatske 14, 10000 Zagreb, Croatia
| | - Benjamin-Florian Hempel
- BIH Center for Regenerative Therapies BCRT, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Tim Hucho
- Translational Pain Research, Department of Anesthesiology and Intensive Care Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
| | - Nasit Igci
- Nevsehir Haci Bektas Veli University, Faculty of Arts and Sciences, Department of Molecular Biology and Genetics, 50300 Nevsehir, Turkey
| | - Maria P Ikonomopoulou
- Madrid Institute for Advanced Studies in Food, Madrid,E28049, Spain
- The University of Queensland, St Lucia, QLD 4072, Australia
| | - Izhar Karbat
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Maria I Klapa
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research & Technology Hellas (FORTH/ICE-HT), Patras GR-26504, Greece
| | - Ivan Koludarov
- Justus Liebig University Giessen, Institute for Insectbiotechnology, Heinrich Buff Ring 26-32, 35396 Giessen, Germany
| | - Jeroen Kool
- Amsterdam Institute of Molecular and Life Sciences, Division of BioAnalytical Chemistry, Faculty of Science, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081HV Amsterdam, The Netherlands
| | - Tim Lüddecke
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Frankfurt, Senckenberganlage 25, 60235 Frankfurt, Germany
- Department of Bioresources, Fraunhofer Institute for Molecular Biology and Applied Ecology, 35392 Gießen, Germany
| | - Riadh Ben Mansour
- Department of Life Sciences, Faculty of Sciences, Gafsa University, Campus Universitaire Siidi Ahmed Zarrouk, 2112 Gafsa, Tunisia
| | - Maria Vittoria Modica
- Dept. of Biology and Evolution of Marine Organisms (BEOM), Stazione Zoologica Anton Dohrn, Via Po 25c, I-00198 Roma, Italy
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Ayse Nalbantsoy
- Department of Bioengineering, Faculty of Engineering, Ege University, 35100 Bornova, Izmir, Turkey
| | - María Eugenia Pachón Ibáñez
- Unit of Infectious Diseases, Microbiology, and Preventive Medicine, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville, 41013 Sevilla, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Alexios Panagiotopoulos
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research & Technology Hellas (FORTH/ICE-HT), Patras GR-26504, Greece
- Animal Biology Division, Department of Biology, University of Patras, Patras, GR-26500, Greece
| | - Eitan Reuveny
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Javier Sánchez Céspedes
- Unit of Infectious Diseases, Microbiology, and Preventive Medicine, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville, 41013 Sevilla, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Andy Sombke
- Department of Evolutionary Biology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria
| | - Joachim M Surm
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Eivind A B Undheim
- University of Oslo, Centre for Ecological and Evolutionary Synthesis, Postboks 1066 Blindern 0316 Oslo, Norway
| | - Aida Verdes
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales, José Gutiérrez Abascal 2, 28006 Madrid, Spain
| | - Giulia Zancolli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
26
|
Qi W, Lim YW, Patrignani A, Schläpfer P, Bratus-Neuenschwander A, Grüter S, Chanez C, Rodde N, Prat E, Vautrin S, Fustier MA, Pratas D, Schlapbach R, Gruissem W. The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features. Gigascience 2022; 11:giac028. [PMID: 35333302 PMCID: PMC8952263 DOI: 10.1093/gigascience/giac028] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 01/11/2022] [Accepted: 02/22/2022] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and subtropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult-to-assemble genome. FINDINGS Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present 2 chromosome-scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy >QV46, contig N50 >18 Mb, BUSCO completeness of 99%, and 35k phased gene loci, it is the most accurate, continuous, complete, and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development, and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20-30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue specific and inconsistent across different tissues. Direction-shifting was observed in <2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome rearrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding. CONCLUSIONS The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness, and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy, and continuity.
Collapse
Affiliation(s)
- Weihong Qi
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
- Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1202, Geneva, Switzerland
| | - Yi-Wen Lim
- Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland
| | - Andrea Patrignani
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Pascal Schläpfer
- Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland
| | - Anna Bratus-Neuenschwander
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Simon Grüter
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Christelle Chanez
- Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland
| | - Nathalie Rodde
- INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France
| | - Elisa Prat
- INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France
| | - Sonia Vautrin
- INRAE, CNRGV French Plant Genomic Resource Center, F-31320, Castanet Tolosan, France
| | | | - Diogo Pratas
- Department of Electronics, Telecommunications and Informatics and Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal
- Department of Virology, University of Helsinki, Haartmaninkatu 3, 00014 Helsinki, Finland
| | - Ralph Schlapbach
- Functional Genomics Center Zurich, ETH Zurich and University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | - Wilhelm Gruissem
- Department of Biology, Institute of Molecular Plant Biology, ETH Zurich, Universitätstrasse 2, 8092, Zurich, Switzerland
- Biotechnology Center, National Chung Hsing University, 145 Xingda Road, Taichung 40227, Taiwan
| |
Collapse
|
27
|
Wierzbicki F, Schwarz F, Cannalonga O, Kofler R. Novel quality metrics allow identifying and generating high-quality assemblies of piRNA clusters. Mol Ecol Resour 2022; 22:102-121. [PMID: 34181811 DOI: 10.1111/1755-0998.13455] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 04/30/2021] [Accepted: 06/14/2021] [Indexed: 12/30/2022]
Abstract
In most animals, it is thought that the proliferation of a transposable element (TE) is stopped when the TE jumps into a piRNA cluster. Despite this central importance, little is known about the composition and the evolutionary dynamics of piRNA clusters. This is largely because piRNA clusters are notoriously difficult to assemble as they are frequently composed of highly repetitive DNA. With long reads, we may finally be able to obtain reliable assemblies of piRNA clusters. Unfortunately, it is unclear how to generate and identify the best assemblies, as many assembly strategies exist and standard quality metrics are ignorant of TEs. To address these problems, we introduce several novel quality metrics that assess: (a) the fraction of completely assembled piRNA clusters, (b) the quality of the assembled clusters and (c) whether an assembly captures the overall TE landscape of an organisms (i.e. the abundance, the number of SNPs and internal deletions of all TE families). The requirements for computing these metrics vary, ranging from annotations of piRNA clusters to consensus sequences of TEs and genomic sequencing data. Using these novel metrics, we evaluate the effect of assembly algorithm, polishing, read length, coverage, residual polymorphisms and finally identify strategies that yield reliable assemblies of piRNA clusters. Based on an optimized approach, we provide assemblies for the two Drosophila melanogaster strains Canton-S and Pi2. About 80% of known piRNA clusters were assembled in both strains. Finally, we demonstrate the generality of our approach by extending our metrics to humans and Arabidopsis thaliana.
Collapse
Affiliation(s)
- Filip Wierzbicki
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Florian Schwarz
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria.,Vienna Graduate School of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | | | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
28
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
Affiliation(s)
- Boas Pucker
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
- Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Author for correspondence: Boas Pucker E-mail:
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
- Department of Applied Bioinformatics, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Göttingen, Germany
| | - Bo Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
29
|
Bornemann TLV, Adam PS, Probst AJ. Reconstruction of Archaeal Genomes from Short-Read Metagenomes. Methods Mol Biol 2022; 2522:487-527. [PMID: 36125772 DOI: 10.1007/978-1-0716-2445-6_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
As the majority of biological diversity remains unexplored and uncultured, investigating it requires culture-independent approaches. Archaea in particular suffer from a multitude of issues that make their culturing problematic, from them being frequently members of the rare biosphere, to low growth rates, to them thriving under very specific and often extreme environmental and community conditions that are difficult to replicate. OMICs techniques are state of the art approaches that allow direct high-throughput investigations of environmental samples at all levels from nucleic acids to proteins, lipids, and secondary metabolites. Metagenomics, as the foundation for other OMICs techniques, facilitates the identification and functional characterization of the microbial community members and can be combined with other methods to provide insights into the microbial activities, both on the RNA and protein levels. In this chapter, we provide a step-by-step workflow for the recovery of archaeal genomes from metagenomes, starting from raw short-read sequences. This workflow can be applied to recover bacterial genomes as well.
Collapse
Affiliation(s)
- Till L V Bornemann
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany.
| | - Panagiotis S Adam
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany
| | - Alexander J Probst
- Environmental Microbiology and Biotechnology, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany.
- Centre of Water and Environmental Research (ZWU), University of Duisburg-Essen, Essen, Germany.
| |
Collapse
|
30
|
Wagner DD, Carleton HA, Trees E, Katz LS. Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks. PeerJ 2021; 9:e12446. [PMID: 34900416 PMCID: PMC8627651 DOI: 10.7717/peerj.12446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 10/18/2021] [Indexed: 11/25/2022] Open
Abstract
Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.
Collapse
Affiliation(s)
- Darlene D Wagner
- Division of Viral Diseases, Centers for Disease Control and Prevention, Atlanta, GA, United States of America.,Eagle Medical Services, LLC, Atlanta, GA, United States of America
| | - Heather A Carleton
- Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, United States of America
| | - Eija Trees
- Association of Public Health Laboratories, Silver Spring, MD, United States of America
| | - Lee S Katz
- Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, United States of America.,Center for Food Safety, University of Georgia, Griffin, GA, United States of America
| |
Collapse
|
31
|
Chen Y, Zhang Y, Wang AY, Gao M, Chong Z. Accurate long-read de novo assembly evaluation with Inspector. Genome Biol 2021; 22:312. [PMID: 34775997 PMCID: PMC8590762 DOI: 10.1186/s13059-021-02527-4] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 10/27/2021] [Indexed: 11/12/2022] Open
Abstract
Long-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.
Collapse
Affiliation(s)
- Yu Chen
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Yixin Zhang
- Department of Computer Science, College of Arts and Sciences, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Amy Y Wang
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- Department of Medicine, Division of General Internal Medicine, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
| | - Min Gao
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA
- Department of Medicine, Division of Cardiovascular Disease, Heersink School of Medicine, University of Alabama at Birmingham, AL, 35233, Birmingham, USA
| | - Zechen Chong
- Department of Genetics, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA.
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, 35294, USA.
| |
Collapse
|
32
|
Kooij PW, Pellicer J. Genome Size Versus Genome Assemblies: Are the Genomes Truly Expanded in Polyploid Fungal Symbionts? Genome Biol Evol 2021; 12:2384-2390. [PMID: 33283863 PMCID: PMC7719231 DOI: 10.1093/gbe/evaa217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2020] [Indexed: 12/21/2022] Open
Abstract
Each day, as the amount of genomic data and bioinformatics resources grows, researchers are increasingly challenged with selecting the most appropriate approach to analyze their data. In addition, the opportunity to undertake comparative genomic analyses is growing rapidly. This is especially true for fungi due to their small genome sizes (i.e., mean 1C = 44.2 Mb). Given these opportunities and aiming to gain novel insights into the evolution of mutualisms, we focus on comparing the quality of whole genome assemblies for fungus-growing ants cultivars (Hymenoptera: Formicidae: Attini) and a free-living relative. Our analyses reveal that currently available methodologies and pipelines for analyzing whole-genome sequence data need refining. By using different genome assemblers, we show that the genome assembly size depends on what software is used. This, in turn, impacts gene number predictions, with higher gene numbers correlating positively with genome assembly size. Furthermore, the majority of fungal genome size data currently available are based on estimates derived from whole-genome assemblies generated from short-read genome data, rather than from the more accurate technique of flow cytometry. Here, we estimated the haploid genome sizes of three ant fungal symbionts by flow cytometry using the fungus Pleurotus ostreatus (Jacq.) P. Kumm. (1871) as a calibration standard. We found that published genome sizes based on genome assemblies are 2.5- to 3-fold larger than our estimates based on flow cytometry. We, therefore, recommend that flow cytometry is used to precalibrate genome assembly pipelines, to avoid incorrect estimates of genome sizes and ensure robust assemblies.
Collapse
Affiliation(s)
- Pepijn W Kooij
- Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, United Kingdom.,Center for the Study of Social Insects, São Paulo State University (UNESP), Rio Claro, Sao Paulo, Brazil
| | - Jaume Pellicer
- Department of Comparative Plant and Fungal Biology, Royal Botanic Gardens, Kew, Richmond, United Kingdom.,Institut Botànic de Barcelona (IBB, CSIC-Ajuntament de Barcelona), Barcelona, Spain
| |
Collapse
|
33
|
Evaluation of a high-throughput, cost-effective Illumina library preparation kit. Sci Rep 2021; 11:15925. [PMID: 34354114 PMCID: PMC8342411 DOI: 10.1038/s41598-021-94911-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 07/19/2021] [Indexed: 11/08/2022] Open
Abstract
Library preparation for high-throughput sequencing applications is a critical step in producing representative, unbiased sequencing data. The iGenomX Riptide High Throughput Rapid Library Prep Kit purports to provide high-quality sequencing data with lower costs compared to other Illumina library kits. To test these claims, we compared sequence data quality of Riptide libraries to libraries constructed with KAPA Hyper and NEBNext Ultra. Across several single-source genome samples, mapping performance and de novo assembly of Riptide libraries were similar to conventional libraries prepared with the same DNA. Poor performance of some libraries resulted in low sequencing depth. In particular, degraded DNA samples may be challenging to sequence with Riptide. There was little cross-well plate contamination with the overwhelming majority of reads belong to the proper source genomes. The sequencing of metagenome samples using different Riptide primer sets resulted in variable taxonomic assignment of reads. Increased adoption of the Riptide kit will decrease library preparation costs. However, this method might not be suitable for degraded DNA.
Collapse
|
34
|
Metatranscriptomic Analysis of Bacterial Communities on Laundered Textiles: A Pilot Case Study. Microorganisms 2021; 9:microorganisms9081591. [PMID: 34442670 PMCID: PMC8400938 DOI: 10.3390/microorganisms9081591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
Microbially contaminated washing machines and mild laundering conditions facilitate the survival and growth of microorganisms on laundry, promoting undesired side effects such as malodor formation. Clearly, a deeper understanding of the functionality and hygienic relevance of the laundry microbiota necessitates the analysis of the microbial gene expression on textiles after washing, which—to the best of our knowledge—has not been performed before. In this pilot case study, we used single-end RNA sequencing to generate de novo transcriptomes of the bacterial communities remaining on polyester and cotton fabrics washed in a domestic washing machine in mild conditions and subsequently incubated under moist conditions for 72 h. Two common de novo transcriptome assemblers were used. The final assemblies included 22,321 Trinity isoforms and 12,600 Spades isoforms. A large part of these isoforms could be assigned to the SwissProt database, and was further categorized into “molecular function”, “biological process” and “cellular component” using Gene Ontology (GO) terms. In addition, differential gene expression was used to show the difference in the pairwise comparison of the two tissue types. When comparing the assemblies generated with the two assemblers, the annotation results were relatively similar. However, there were clear differences between the de novo assemblies regarding differential gene expression.
Collapse
|
35
|
Wakabayashi Y, Sekizuka T, Yamaguchi T, Fukuda A, Suzuki M, Kawahara R, Taguchi M, Kuroda M, Semba K, Shinomiya H, Kawatsu K. Isolation and plasmid characterisation of Salmonella enterica serovar Albany harbouring mcr-5 from retail chicken meat in Japan. FEMS Microbiol Lett 2021; 367:5881302. [PMID: 32756977 DOI: 10.1093/femsle/fnaa127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 07/28/2020] [Indexed: 11/13/2022] Open
Abstract
The emergence of plasmid-mediated colistin resistance genes (mcr), which is occurring in numerous countries, is a worldwide concern, primarily because colistin is a last-resort antibiotic. Compared to E. coli, prevalence of mcr genes in Salmonella is unclear in Japan. Here we screened for mcr-1-5 genes in our collection of Salmonella strains isolated from retail meat products collected in Japan from 2012 through 2016. We found that Salmonella Albany strain 27A-368 encodes mcr-5 and that mcr genes were undetectable among the remaining 202 isolates. The resistance plasmid p27A-368 was transferred by conjugation to S. Infantis and was stably retained as a transconjugant. Whole-genome sequencing revealed that mcr-5 resided on a 115 kb plasmid (p27A-368). The plasmid backbone of p27A-368 is more similar to that of pCOV27, an ESBL-encoding plasmid recovered from avian pathogenic E. coli, rather than pSE13-SA01718 of S. Paratyphi B that encodes mcr-5. Further, mcr-5 is located on a transposon, and its sequence is similar to that of pSE13-SA01718. A phylogenetic tree based on single nucleotide variants implies a relationship between 27A-368 and S. Albany isolated in Southeast Asian countries.
Collapse
Affiliation(s)
- Yuki Wakabayashi
- Bacteriology Section, Division of Microbiology, Osaka Institute of Public Health, 1-3-69 Nakamichi, Higashinari-ku, Osaka, Japan
| | - Tsuyoshi Sekizuka
- Pathogen Genomics Centre, National Institute of Infectious Diseases, 1-3-21 Toyama Shinjuku-ku, Tokyo, Japan
| | - Takahiro Yamaguchi
- Bacteriology Section, Division of Microbiology, Osaka Institute of Public Health, 1-3-69 Nakamichi, Higashinari-ku, Osaka, Japan
| | - Akira Fukuda
- Microbiology Section, Division of Microbiology, Osaka Institute of Public Health, 8-34 Toujyo-cho, Tennouji-ku, Osaka, Japan
| | - Masato Suzuki
- Antimicrobial Resistance Research Centre, National Institute of Infectious Diseases, 4-2-1 Aoba-cho, Higashimurayama-shi, Tokyo, Japan
| | - Ryuji Kawahara
- Bacteriology Section, Division of Microbiology, Osaka Institute of Public Health, 1-3-69 Nakamichi, Higashinari-ku, Osaka, Japan
| | - Masumi Taguchi
- Bacteriology Section, Division of Microbiology, Osaka Institute of Public Health, 1-3-69 Nakamichi, Higashinari-ku, Osaka, Japan
| | - Makoto Kuroda
- Pathogen Genomics Centre, National Institute of Infectious Diseases, 1-3-21 Toyama Shinjuku-ku, Tokyo, Japan
| | - Keiko Semba
- Ehime Prefectural Institute of Public Health and Environmental Science, 8-234 Sanban-cho, Matsuyama-shi, Ehime, Japan
| | - Hiroto Shinomiya
- Ehime Prefectural Institute of Public Health and Environmental Science, 8-234 Sanban-cho, Matsuyama-shi, Ehime, Japan
| | - Kentaro Kawatsu
- Bacteriology Section, Division of Microbiology, Osaka Institute of Public Health, 1-3-69 Nakamichi, Higashinari-ku, Osaka, Japan
| |
Collapse
|
36
|
A de novo transcriptional atlas in Danaus plexippus reveals variability in dosage compensation across tissues. Commun Biol 2021; 4:791. [PMID: 34172835 PMCID: PMC8233437 DOI: 10.1038/s42003-021-02335-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 06/09/2021] [Indexed: 02/06/2023] Open
Abstract
A detailed knowledge of gene function in the monarch butterfly is still lacking. Here we generate a genome assembly from a Mexican nonmigratory population and used RNA-seq data from 14 biological samples for gene annotation and to construct an atlas portraying the breadth of gene expression during most of the monarch life cycle. Two thirds of the genes show expression changes, with long noncoding RNAs being particularly finely regulated during adulthood, and male-biased expression being four times more common than female-biased. The two portions of the monarch heterochromosome Z, one ancestral to the Lepidoptera and the other resulting from a chromosomal fusion, display distinct association with sex-biased expression, reflecting sample-dependent incompleteness or absence of dosage compensation in the ancestral but not the novel portion of the Z. This study presents extended genomic and transcriptomic resources that will facilitate a better understanding of the monarch's adaptation to a changing environment.
Collapse
|
37
|
Genome sequencing and de novo assembly of the giant unicellular alga Acetabularia acetabulum using droplet MDA. Sci Rep 2021; 11:12820. [PMID: 34140556 PMCID: PMC8211769 DOI: 10.1038/s41598-021-92092-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 05/28/2021] [Indexed: 11/08/2022] Open
Abstract
The macroscopic single-celled green alga Acetabularia acetabulum has been a model system in cell biology for more than a century. However, no genomic information is available from this species. Since the alga has a long life cycle, is difficult to grow in dense cultures, and has an estimated diploid genome size of almost 2 Gb, obtaining sufficient genomic material for genome sequencing is challenging. Here, we have attempted to overcome these challenges by amplifying genomic DNA using multiple displacement amplification (MDA) combined with microfluidics technology to distribute the amplification reactions across thousands of microscopic droplets. By amplifying and sequencing DNA from five single cells we were able to recover an estimated ~ 7–11% of the total genome, providing the first draft of the A. acetabulum genome. We highlight challenges associated with genome recovery and assembly of MDA data due to biases arising during genome amplification, and hope that our study can serve as a reference for future attempts on sequencing the genome from non-model eukaryotes.
Collapse
|
38
|
Xie L, Wong L. PDR: a new genome assembly evaluation metric based on genetics concerns. Bioinformatics 2021; 37:289-295. [PMID: 32761066 DOI: 10.1093/bioinformatics/btaa704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 06/30/2020] [Accepted: 07/30/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Existing genome assembly evaluation metrics provide only limited insight on specific aspects of genome assembly quality, and sometimes even disagree with each other. For better integrative comparison between assemblies, we propose, here, a new genome assembly evaluation metric, Pairwise Distance Reconstruction (PDR). It derives from a common concern in genetic studies, and takes completeness, contiguity, and correctness into consideration. We also propose an approximation implementation to accelerate PDR computation. RESULTS Our results on publicly available datasets affirm PDR's ability to integratively assess the quality of a genome assembly. In fact, this is guaranteed by its definition. The results also indicated the error introduced by approximation is extremely small and thus negligible. AVAILABILITYAND IMPLEMENTATION https://github.com/XLuyu/PDR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luyu Xie
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| | - Limsoon Wong
- Department of Computer Science, School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
39
|
Al Qaffas A, Nichols J, Davison AJ, Ourahmane A, Hertel L, McVoy MA, Camiolo S. LoReTTA, a user-friendly tool for assembling viral genomes from PacBio sequence data. Virus Evol 2021; 7:veab042. [PMID: 33996146 PMCID: PMC8111061 DOI: 10.1093/ve/veab042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Long-read, single-molecule DNA sequencing technologies have triggered a revolution in genomics by enabling the determination of large, reference-quality genomes in ways that overcome some of the limitations of short-read sequencing. However, the greater length and higher error rate of the reads generated on long-read platforms make the tools used for assembling short reads unsuitable for use in data assembly and motivate the development of new approaches. We present LoReTTA (Long Read Template-Targeted Assembler), a tool designed for performing de novo assembly of long reads generated from viral genomes on the PacBio platform. LoReTTA exploits a reference genome to guide the assembly process, an approach that has been successful with short reads. The tool was designed to deal with reads originating from viral genomes, which feature high genetic variability, possible multiple isoforms, and the dominant presence of additional organisms in clinical or environmental samples. LoReTTA was tested on a range of simulated and experimental datasets and outperformed established long-read assemblers in terms of assembly contiguity and accuracy. The software runs under the Linux operating system, is designed for easy adaptation to alternative systems, and features an automatic installation pipeline that takes care of the required dependencies. A command-line version and a user-friendly graphical interface version are available under a GPLv3 license at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.
Collapse
Affiliation(s)
- Ahmed Al Qaffas
- Department of Pediatrics, Virginia Commonwealth University, Richmond, VA, USA
| | - Jenna Nichols
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Andrew J Davison
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Amine Ourahmane
- Department of Pediatrics, Virginia Commonwealth University, Richmond, VA, USA
| | - Laura Hertel
- Department of Pediatrics, School of Medicine, University of California San Francisco, Oakland, CA, USA
| | - Michael A McVoy
- Department of Pediatrics, Virginia Commonwealth University, Richmond, VA, USA
| | | |
Collapse
|
40
|
Clifton BD, Jimenez J, Kimura A, Chahine Z, Librado P, Sánchez-Gracia A, Abbassi M, Carranza F, Chan C, Marchetti M, Zhang W, Shi M, Vu C, Yeh S, Fanti L, Xia XQ, Rozas J, Ranz JM. Understanding the Early Evolutionary Stages of a Tandem Drosophilamelanogaster-Specific Gene Family: A Structural and Functional Population Study. Mol Biol Evol 2021; 37:2584-2600. [PMID: 32359138 PMCID: PMC7475035 DOI: 10.1093/molbev/msaa109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Gene families underlie genetic innovation and phenotypic diversification. However, our understanding of the early genomic and functional evolution of tandemly arranged gene families remains incomplete as paralog sequence similarity hinders their accurate characterization. The Drosophila melanogaster-specific gene family Sdic is tandemly repeated and impacts sperm competition. We scrutinized Sdic in 20 geographically diverse populations using reference-quality genome assemblies, read-depth methodologies, and qPCR, finding that ∼90% of the individuals harbor 3-7 copies as well as evidence of population differentiation. In strains with reliable gene annotations, copy number variation (CNV) and differential transposable element insertions distinguish one structurally distinct version of the Sdic region per strain. All 31 annotated copies featured protein-coding potential and, based on the protein variant encoded, were categorized into 13 paratypes differing in their 3' ends, with 3-5 paratypes coexisting in any strain examined. Despite widespread gene conversion, the only copy present in all strains has functionally diverged at both coding and regulatory levels under positive selection. Contrary to artificial tandem duplications of the Sdic region that resulted in increased male expression, CNV in cosmopolitan strains did not correlate with expression levels, likely as a result of differential genome modifier composition. Duplicating the region did not enhance sperm competitiveness, suggesting a fitness cost at high expression levels or a plateau effect. Beyond facilitating a minimally optimal expression level, Sdic CNV acts as a catalyst of protein and regulatory diversity, showcasing a possible evolutionary path recently formed tandem multigene families can follow toward long-term consolidation in eukaryotic genomes.
Collapse
Affiliation(s)
- Bryan D Clifton
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Jamie Jimenez
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Ashlyn Kimura
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Zeinab Chahine
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Pablo Librado
- Laboratoire AMIS CNRS UMR 5288, Faculté de Médicine de Purpan, Université Paul Sabatier, Toulouse, France
| | - Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadistica, Universitat de Barcelona, Barcelona, Spain.,Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Barcelona, Spain
| | - Mashya Abbassi
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Francisco Carranza
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Carolus Chan
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Marcella Marchetti
- Istituto Pasteur Italia, Fondazione Cenci-Bolognetti, Rome, Italy.,Department of Biology and Biotechnology "C. Darwin", Sapienza University of Rome, Rome, Italy
| | - Wanting Zhang
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Mijuan Shi
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Christine Vu
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| | - Shudan Yeh
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA.,Department of Life Sciences, National Central University, Taoyuan City, Zhongli District, Taiwan
| | - Laura Fanti
- Istituto Pasteur Italia, Fondazione Cenci-Bolognetti, Rome, Italy.,Department of Biology and Biotechnology "C. Darwin", Sapienza University of Rome, Rome, Italy
| | - Xiao-Qin Xia
- Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei Province, China
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadistica, Universitat de Barcelona, Barcelona, Spain.,Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Barcelona, Spain
| | - José M Ranz
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA
| |
Collapse
|
41
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
42
|
Gavrielatos M, Kyriakidis K, Spandidos DA, Michalopoulos I. Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Mol Med Rep 2021; 23:251. [PMID: 33537807 PMCID: PMC7893683 DOI: 10.3892/mmr.2021.11890] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 01/21/2021] [Indexed: 12/30/2022] Open
Abstract
Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel de novo genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired‑end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long‑read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi‑C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi‑C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.
Collapse
Affiliation(s)
- Marios Gavrielatos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, 15701 Athens, Greece
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTh), 54124 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Centre for Interdisciplinary Research and Innovation, 57001 Thessaloniki, Greece
| | - Demetrios A. Spandidos
- Laboratory of Clinical Virology, Medical School, University of Crete, 71003 Heraklion, Greece
| | - Ioannis Michalopoulos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
| |
Collapse
|
43
|
Jauhal AA, Newcomb RD. Assessing genome assembly quality prior to downstream analysis: N50 versus BUSCO. Mol Ecol Resour 2021; 21:1416-1421. [PMID: 33629477 DOI: 10.1111/1755-0998.13364] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Accepted: 02/16/2021] [Indexed: 12/21/2022]
Abstract
With the ever-increasing number of publicly available eukaryotic genome assemblies and user-friendly bioinformatics tools, there are increasing opportunities for researchers to use genomic resources in their research. While there are multiple dimensions to genome quality, it is often reduced to a single score that may not be correlated with other metrics, or appropriate for all applications of an assembly. To assess whether the commonly reported N50 value could reliably predict a separate dimension of genome quality, gene space completeness, we performed a meta-analysis of 611 published articles on eukaryotic genomes that used BUSCO scores, in addition to the typical N50 score. We found that although assemblies with relatively high contig and scaffold N50 values consistently had high BUSCO scores, a high BUSCO score could also be obtained from assemblies with a low N50. This reinforces that despite its ubiquity, N50 is not a perfect proxy for all measures of genome accuracy. Our data also suggests that variations in BUSCO scores among assemblies with poor N50 scores may be related to the number of introns in conserved eukaryotic genes. We stress the importance of screening and evaluating assembly quality based on the appropriate tools and urge increased reporting of additional genome assessment metrics in addition to N50. We also discuss the potential limitations of BUSCO and suggest improvements for assessing gene space within genome assemblies.
Collapse
Affiliation(s)
- April A Jauhal
- School of Biological Sciences, University of Auckland, Auckland, New Zealand.,The New Zealand Institute for Plant & Food Research, Auckland, New Zealand
| | - Richard D Newcomb
- The New Zealand Institute for Plant & Food Research, Auckland, New Zealand
| |
Collapse
|
44
|
Schmeing S, Robinson MD. ReSeq simulates realistic Illumina high-throughput sequencing data. Genome Biol 2021; 22:67. [PMID: 33608040 PMCID: PMC7896392 DOI: 10.1186/s13059-021-02265-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 01/07/2021] [Indexed: 12/18/2022] Open
Abstract
In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq.
Collapse
Affiliation(s)
- Stephan Schmeing
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, Winterthurerstrasse 190, Zurich, 8057, Switzerland.
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, Winterthurerstrasse 190, Zurich, 8057, Switzerland.
| |
Collapse
|
45
|
Pan Q, Feron R, Jouanno E, Darras H, Herpin A, Koop B, Rondeau E, Goetz FW, Larson WA, Bernatchez L, Tringali M, Curran SS, Saillant E, Denys GPJ, von Hippel FA, Chen S, López JA, Verreycken H, Ocalewicz K, Guyomard R, Eche C, Lluch J, Roques C, Hu H, Tabor R, DeHaan P, Nichols KM, Journot L, Parrinello H, Klopp C, Interesova EA, Trifonov V, Schartl M, Postlethwait J, Guiguen Y. The rise and fall of the ancient northern pike master sex-determining gene. eLife 2021; 10:e62858. [PMID: 33506762 PMCID: PMC7870143 DOI: 10.7554/elife.62858] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 01/27/2021] [Indexed: 12/15/2022] Open
Abstract
The understanding of the evolution of variable sex determination mechanisms across taxa requires comparative studies among closely related species. Following the fate of a known master sex-determining gene, we traced the evolution of sex determination in an entire teleost order (Esociformes). We discovered that the northern pike (Esox lucius) master sex-determining gene originated from a 65 to 90 million-year-old gene duplication event and that it remained sex linked on undifferentiated sex chromosomes for at least 56 million years in multiple species. We identified several independent species- or population-specific sex determination transitions, including a recent loss of a Y chromosome. These findings highlight the diversity of evolutionary fates of master sex-determining genes and the importance of population demographic history in sex determination studies. We hypothesize that occasional sex reversals and genetic bottlenecks provide a non-adaptive explanation for sex determination transitions.
Collapse
Affiliation(s)
- Qiaowei Pan
- INRAE, LPGPRennesFrance
- Department of Ecology and Evolution, University of LausanneLausanneSwitzerland
| | - Romain Feron
- INRAE, LPGPRennesFrance
- Department of Ecology and Evolution, University of LausanneLausanneSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | | | - Hugo Darras
- Department of Ecology and Evolution, University of LausanneLausanneSwitzerland
| | | | - Ben Koop
- Department of Biology, Centre for Biomedical Research, University of VictoriaVictoriaCanada
| | - Eric Rondeau
- Department of Biology, Centre for Biomedical Research, University of VictoriaVictoriaCanada
| | - Frederick W Goetz
- Environmental and Fisheries Sciences Division, Northwest Fisheries Science Center, National Marine Fisheries Service, NOAASeattleUnited States
| | - Wesley A Larson
- Fisheries Aquatic Science and Technology Laboratory at Alaska Pacific UniversityAnchorageUnited States
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université LavalQuébecCanada
| | - Mike Tringali
- Fish and Wildlife Conservation Commission, Florida Marine Research InstituteSt. PetersburgUnited States
| | - Stephen S Curran
- School of Fisheries and Aquatic Sciences, Auburn UniversityAuburnUnited States
| | - Eric Saillant
- Gulf Coast Research Laboratory, School of Ocean Science and Technology, The University of Southern MississippiOcean SpringsUnited States
| | - Gael PJ Denys
- Laboratoire de Biologie des organismes et écosystèmes aquatiques (BOREA), MNHN, CNRS, IRD, SU, UCN, Laboratoire de Biologie des organismes et écosystèmes aquatiques (BOREA)ParisFrance
- Unité Mixte de Service Patrimoine Naturelle – Centre d’expertise et de données (UMS 2006 AFB, CNRS, MNHN), Muséum national d’Histoire naturelleParisFrance
| | - Frank A von Hippel
- Department of Biological Sciences, Northern Arizona UniversityFlagstaffUnited States
| | - Songlin Chen
- Yellow Sea Fisheries Research Institute, CAFS, Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao)QingdaoChina
| | - J Andrés López
- College of Fisheries and Ocean Sciences FisheriesFairbanksUnited States
| | - Hugo Verreycken
- Research Institute for Nature and Forest (INBO)BrusselsBelgium
| | - Konrad Ocalewicz
- Department of Marine Biology and Ecology, Institute of Oceanography, University of GdanskGdanskPoland
| | | | - Camille Eche
- GeT‐PlaGe, INRAE, GenotoulCastanet-TolosanFrance
| | - Jerome Lluch
- GeT‐PlaGe, INRAE, GenotoulCastanet-TolosanFrance
| | | | - Hongxia Hu
- Beijing Fisheries Research Institute & Beijing Key Laboratory of Fishery BiotechnologyBeijingChina
| | - Roger Tabor
- U.S. Fish and Wildlife ServiceLaceyUnited States
| | | | - Krista M Nichols
- Conservation Biology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric AdministrationSeattleUnited States
| | - Laurent Journot
- Institut de Génomique Fonctionnelle, IGF, CNRS, INSERM, Univ. MontpellierMontpellierFrance
| | - Hugues Parrinello
- Institut de Génomique Fonctionnelle, IGF, CNRS, INSERM, Univ. MontpellierMontpellierFrance
| | | | | | - Vladimir Trifonov
- Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, Novosibirsk State UniversityNovosibirskRussian Federation
| | - Manfred Schartl
- University of Wuerzburg, Developmental Biochemistry, Biocenter, 97074 Würzburg, Germany; and The Xiphophorus Genetic Stock Center, Texas State UniversitySan MarcosUnited States
| | | | | |
Collapse
|
46
|
Du H, Diao C, Zhao P, Zhou L, Liu JF. Integrated hybrid de novo assembly technologies to obtain high-quality pig genome using short and long reads. Brief Bioinform 2021; 22:6082823. [PMID: 33429431 DOI: 10.1093/bib/bbaa399] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 11/20/2020] [Accepted: 12/08/2020] [Indexed: 11/12/2022] Open
Abstract
With the rapid progress of sequencing technologies, various types of sequencing reads and assembly algorithms have been designed to construct genome assemblies. Although recent studies have attempted to evaluate the appropriate type of sequencing reads and algorithms for assembling high-quality genomes, it is still a challenge to set the correct combination for constructing animal genomes. Here, we present a comparative performance assessment of 14 assembly combinations-9 software programs with different short and long reads of Duroc pig. Based on the results of the optimization process for genome construction, we designed an integrated hybrid de novo assembly pipeline, HSCG, and constructed a draft genome for Duroc pig. Comparison between the new genome and Sus scrofa 11.1 revealed important breakpoints in two S. scrofa 11.1 genes. Our findings may provide new insights into the pan-genome analysis studies of agricultural animals, and the integrated assembly pipeline may serve as a guide for the assembly of other animal genomes.
Collapse
Affiliation(s)
- Heng Du
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Chenguang Diao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Pengju Zhao
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Lei Zhou
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding; Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
47
|
Wyatt NA, Friesen TL. Four Reference Quality Genome Assemblies of Pyrenophora teres f. maculata: A Resource for Studying the Barley Spot Form Net Blotch Interaction. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2021; 34:135-139. [PMID: 33054576 DOI: 10.1094/mpmi-08-20-0228-a] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Pyrenophora teres is the causal agent of net blotch, the most devastating foliar disease of barley. In nature, net blotch is seen in two forms, net form net blotch, caused by P. teres f. teres, and spot form net blotch, caused by P. teres f. maculata. To date, 11 P. teres f. teres genomes have been sequenced and deposited in publicly available repositories, but only one P. teres f. maculata genome has been publicly deposited. Here, we present four additional reference-quality full-genome sequences of P. teres f. maculata isolates with good geographical and phenotypic diversity, with accompanying RNA sequencing-based genome annotations. These additional P. teres f. maculata genomes will aid in the understanding of the genomic complexities of this important barley pathogen.[Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Nathan A Wyatt
- USDA-ARS Edward T. Schafer Agricultural Research Center, Cereal Crops Research Unit, Fargo, ND, U.S.A
| | - Timothy L Friesen
- USDA-ARS Edward T. Schafer Agricultural Research Center, Cereal Crops Research Unit, Fargo, ND, U.S.A
| |
Collapse
|
48
|
Computational Genomics. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
49
|
Camiolo S, Suárez NM, Chalka A, Venturini C, Breuer J, Davison AJ. GRACy: A tool for analysing human cytomegalovirus sequence data. Virus Evol 2020; 7:veaa099. [PMID: 33505707 PMCID: PMC7816668 DOI: 10.1093/ve/veaa099] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Modern DNA sequencing has instituted a new era in human cytomegalovirus (HCMV) genomics. A key development has been the ability to determine the genome sequences of HCMV strains directly from clinical material. This involves the application of complex and often non-standardized bioinformatics approaches to analysing data of variable quality in a process that requires substantial manual intervention. To relieve this bottleneck, we have developed GRACy (Genome Reconstruction and Annotation of Cytomegalovirus), an easy-to-use toolkit for analysing HCMV sequence data. GRACy automates and integrates modules for read filtering, genotyping, genome assembly, genome annotation, variant analysis, and data submission. These modules were tested extensively on simulated and experimental data and outperformed generic approaches. GRACy is written in Python and is embedded in a graphical user interface with all required dependencies installed by a single command. It runs on the Linux operating system and is designed to allow the future implementation of a cross-platform version. GRACy is distributed under a GPL 3.0 license and is freely available at https://bioinformatics.cvr.ac.uk/software/ with the manual and a test dataset.
Collapse
Affiliation(s)
| | - Nicolás M Suárez
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Antonia Chalka
- Division of Infection & Immunity, Roslin Institute, R(D)SVM, University of Edinburgh, Edinburgh, UK
| | - Cristina Venturini
- Division of Infection and Immunity, University College London, London, UK
| | - Judith Breuer
- Division of Infection and Immunity, University College London, London, UK
| | - Andrew J Davison
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| |
Collapse
|
50
|
Naranpanawa DNU, Chandrasekara CHWMRB, Bandaranayake PCG, Bandaranayake AU. Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists. Sci Rep 2020; 10:18236. [PMID: 33106560 PMCID: PMC7588437 DOI: 10.1038/s41598-020-75270-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 09/21/2020] [Indexed: 02/07/2023] Open
Abstract
Recent advances in next-generation sequencing technologies have paved the path for a considerable amount of sequencing data at a relatively low cost. This has revolutionized the genomics and transcriptomics studies. However, different challenges are now created in handling such data with available bioinformatics platforms both in assembly and downstream analysis performed in order to infer correct biological meaning. Though there are a handful of commercial software and tools for some of the procedures, cost of such tools has made them prohibitive for most research laboratories. While individual open-source or free software tools are available for most of the bioinformatics applications, those components usually operate standalone and are not combined for a user-friendly workflow. Therefore, beginners in bioinformatics might find analysis procedures starting from raw sequence data too complicated and time-consuming with the associated learning-curve. Here, we outline a procedure for de novo transcriptome assembly and Simple Sequence Repeats (SSR) primer design solely based on tools that are available online for free use. For validation of the developed workflow, we used Illumina HiSeq reads of different tissue samples of Santalum album (sandalwood), generated from a previous transcriptomics project. A portion of the designed primers were tested in the lab with relevant samples and all of them successfully amplified the targeted regions. The presented bioinformatics workflow can accurately assemble quality transcriptomes and develop gene specific SSRs. Beginner biologists and researchers in bioinformatics can easily utilize this workflow for research purposes.
Collapse
Affiliation(s)
- D N U Naranpanawa
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
- Postgraduate Institute of Science, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - C H W M R B Chandrasekara
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - P C G Bandaranayake
- Agricultural Biotechnology Centre, Faculty of Agriculture, University of Peradeniya, Peradeniya, 20400, Sri Lanka
| | - A U Bandaranayake
- Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, Peradeniya, 20400, Sri Lanka.
| |
Collapse
|