1
|
Fu J, Liu X, Deng R, Jiang X, Cai W, Fu H, Shao X. Accurate Prediction of CRISPR/Cas13a Guide Activity Using Feature Selection and Deep Learning. J Chem Inf Model 2025; 65:3380-3387. [PMID: 40091632 DOI: 10.1021/acs.jcim.4c02438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
CRISPR/Cas13a serves as a key tool for nucleic acid tests; therefore, accurate prediction of its activity is essential for creating robust and sensitive diagnosis. In this study, we create a dual-branch neural network model that achieves high prediction accuracy and classification performance across two independent CRISPR/Cas13a data sets, outperforming previously published models relying solely on sequence features. The model integrates direct sequence encoding with descriptive features and yields 99 key descriptive features out of 1553, extracted through statistical analysis, which critically influence guide-target interactions and Cas13a guide activity. By employing Shapley Additive Explanations and Integrated Gradients for feature importance analysis, we show that sequence composition, mismatch type and frequency, and the protospacer flanking site region are primary features. These findings underscore the importance of using descriptive features as complementary inputs to deep learning-based encoding and provide valuable insights into the mechanisms underlying guide-target interaction. All in all, this study not only introduces a reliable and efficient model for Cas13a guide activity prediction but also offers a foundation for future rational design efforts.
Collapse
Affiliation(s)
- Jiashun Fu
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xuyang Liu
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Ruijie Deng
- College of Biomass Science and Engineering, Healthy Food Evaluation Research Center, Sichuan University, Chengdu 610065, China
| | - Xiue Jiang
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
| | - Wensheng Cai
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Haohao Fu
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| | - Xueguang Shao
- Research Center for Analytical Sciences, College of Chemistry, Nankai University, Tianjin 300071, China
- Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China
| |
Collapse
|
2
|
Menon AV, Song B, Chao L, Sriram D, Chansky P, Bakshi I, Ulianova J, Li W. Unraveling the future of genomics: CRISPR, single-cell omics, and the applications in cancer and immunology. Front Genome Ed 2025; 7:1565387. [PMID: 40292231 PMCID: PMC12021818 DOI: 10.3389/fgeed.2025.1565387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Accepted: 03/26/2025] [Indexed: 04/30/2025] Open
Abstract
The CRISPR system has transformed many research areas, including cancer and immunology, by providing a simple yet effective genome editing system. Its simplicity has facilitated large-scale experiments to assess gene functionality across diverse biological contexts, generating extensive datasets that boosted the development of computational methods and machine learning/artificial intelligence applications. Integrating CRISPR with single-cell technologies has further advanced our understanding of genome function and its role in many biological processes, providing unprecedented insights into human biology and disease mechanisms. This powerful combination has accelerated AI-driven analyses, enhancing disease diagnostics, risk prediction, and therapeutic innovations. This review provides a comprehensive overview of CRISPR-based genome editing systems, highlighting their advancements, current progress, challenges, and future opportunities, especially in cancer and immunology.
Collapse
Affiliation(s)
- A. Vipin Menon
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, DC, United States
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, DC, United States
| | - Bicna Song
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, DC, United States
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, DC, United States
| | - Lumen Chao
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, DC, United States
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, DC, United States
| | - Diksha Sriram
- The George Washington University, Washington, DC, DC, United States
| | - Pamela Chansky
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, DC, United States
- Integrated Biomedical Sciences (IBS) Program, The George Washington University, Washington, DC, DC, United States
| | - Ishnoor Bakshi
- The George Washington University, Washington, DC, DC, United States
| | - Jane Ulianova
- Integrated Biomedical Sciences (IBS) Program, The George Washington University, Washington, DC, DC, United States
| | - Wei Li
- Center for Genetic Medicine Research, Children’s National Hospital, Washington, DC, DC, United States
- Department of Genomics and Precision Medicine, George Washington University, Washington, DC, DC, United States
| |
Collapse
|
3
|
Zheng Y, Zou Q, Li J, Yang Y. CRISPR-MFH: A Lightweight Hybrid Deep Learning Framework with Multi-Feature Encoding for Improved CRISPR-Cas9 Off-Target Prediction. Genes (Basel) 2025; 16:387. [PMID: 40282347 PMCID: PMC12026807 DOI: 10.3390/genes16040387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2025] [Revised: 03/25/2025] [Accepted: 03/27/2025] [Indexed: 04/29/2025] Open
Abstract
BACKGROUND The CRISPR-Cas9 system has emerged as one of the most promising gene-editing technologies in biology. However, off-target effects remain a significant challenge. While recent advances in deep learning have led to the development of models for off-target prediction, these models often fail to fully leverage sequence pair information. Furthermore, as the models' parameter sizes increase, so do their complexities, limiting their practical applicability. METHODS In this study, we introduce a novel multi-feature independent encoding method, which encodes the gRNA-DNA sequence pair into three distinct feature matrices to minimize information loss. Additionally, we propose a lightweight hybrid deep learning framework, CRISPR-MFH, that integrates multi-scale separable convolutions and hybrid attention mechanisms for efficient and accurate off-target prediction. RESULTS Extensive experiments across multiple benchmark datasets demonstrate that the proposed encoding method effectively captures critical features and that CRISPR-MFH outperforms or matches state-of-the-art models with significantly fewer parameters across multiple evaluation metrics. CONCLUSIONS This study offers a novel perspective for advancing deep learning technology in the realm of CRISPR-Cas9 off-target detection.
Collapse
Affiliation(s)
- Yanyi Zheng
- College of Landscape Architecture, Beijing Forestry University, Beijing 100083, China;
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China;
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Jian Li
- School of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China
| | - Yanpeng Yang
- School of Mathematics and Computer Science, Zhejiang A&F University, Hangzhou 311300, China
| |
Collapse
|
4
|
Chia BS, Seah YFS, Wang B, Shen K, Srivastava D, Chew WL. Engineering a New Generation of Gene Editors: Integrating Synthetic Biology and AI Innovations. ACS Synth Biol 2025; 14:636-647. [PMID: 39999982 PMCID: PMC11934138 DOI: 10.1021/acssynbio.4c00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 01/06/2025] [Accepted: 01/16/2025] [Indexed: 02/27/2025]
Abstract
CRISPR-Cas technology has revolutionized biology by enabling precise DNA and RNA edits with ease. However, significant challenges remain for translating this technology into clinical applications. Traditional protein engineering methods, such as rational design, mutagenesis screens, and directed evolution, have been used to address issues like low efficacy, specificity, and high immunogenicity. These methods are labor-intensive, time-consuming, and resource-intensive and often require detailed structural knowledge. Recently, computational strategies have emerged as powerful solutions to these limitations. Using artificial intelligence (AI) and machine learning (ML), the discovery and design of novel gene-editing enzymes can be streamlined. AI/ML models predict activity, specificity, and immunogenicity while also enhancing mutagenesis screens and directed evolution. These approaches not only accelerate rational design but also create new opportunities for developing safer and more efficient genome-editing tools, which could eventually be translated into the clinic.
Collapse
Affiliation(s)
- Bing Shao Chia
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Yu Fen Samantha Seah
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Bolun Wang
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Kimberle Shen
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Diya Srivastava
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
| | - Wei Leong Chew
- Genome
Institute of Singapore, Agency for Science, Technology and Research, 60 Biopolis Street, Singapore 138672, Singapore
- Synthetic
Biology Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore
| |
Collapse
|
5
|
Ding S, Zheng J, Jia C. DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features. Brief Funct Genomics 2025; 24:elae043. [PMID: 39528429 PMCID: PMC11735754 DOI: 10.1093/bfgp/elae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 10/12/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.
Collapse
Affiliation(s)
- Shumei Ding
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
6
|
Sari O, Liu Z, Pan Y, Shao X. Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding. BIOINFORMATICS ADVANCES 2024; 5:vbae184. [PMID: 39758829 PMCID: PMC11696696 DOI: 10.1093/bioadv/vbae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 10/17/2024] [Accepted: 12/05/2024] [Indexed: 01/07/2025]
Abstract
Motivation Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. In silico prediction using machine learning models provides high-performance alternatives. Results We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing. Availability and implementation The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.
Collapse
Affiliation(s)
- Orhan Sari
- Department of Mining and Materials Engineering, McGill University, Montreal, QC, H3A 2B1, Canada
| | - Ziying Liu
- Digital Technologies Research Center, National Research Council Canada, Ottawa, ON, K1A 0R6, Canada
| | - Youlian Pan
- Digital Technologies Research Center, National Research Council Canada, Ottawa, ON, K1A 0R6, Canada
| | - Xiaojian Shao
- Digital Technologies Research Center, National Research Council Canada, Ottawa, ON, K1A 0R6, Canada
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, K1H 8M5, Canada
| |
Collapse
|
7
|
Alipanahi R, Safari L, Khanteymoori A. DTMP-prime: A deep transformer-based model for predicting prime editing efficiency and PegRNA activity. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102370. [PMID: 39654539 PMCID: PMC11626815 DOI: 10.1016/j.omtn.2024.102370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 10/24/2024] [Indexed: 12/12/2024]
Abstract
Prime editors are CRISPR-based genome engineering tools with significant potential for rectifying patient mutations. However, their usage requires experimental optimization of the prime editing guide RNA (PegRNA) to achieve high editing efficiency. This paper introduces the deep transformer-based model for predicting prime editing efficiency (DTMP-Prime), a tool specifically designed to predict PegRNA activity and prime editing (PE) efficiency. DTMP-Prime facilitates the design of appropriate PegRNA and ngRNA. A transformer-based model was constructed to scrutinize a wide-ranging set of PE data, enabling the extraction of effective features of PegRNAs and target DNA sequences. The integration of these features with the proposed encoding strategy and DNABERT-based embedding has notably improved the predictive capabilities of DTMP-Prime for off-target sites. Moreover, DTMP-Prime is a promising tool for precisely predicting off-target sites in CRISPR experiments. The integration of a multi-head attention framework has additionally improved the precision and generalizability of DTMP-Prime across various PE models and cell lines. Evaluation results based on the Pearson and Spearman correlation coefficient demonstrate that DTMP-Prime outperforms other state-of-the-art models in predicting the efficiency and outcomes of PE experiments.
Collapse
Affiliation(s)
| | - Leila Safari
- Department of Computer Engineering, University of Zanjan, Zanjan, Iran
| | | |
Collapse
|
8
|
Wattad H, Molcho J, Manor R, Weil S, Aflalo ED, Chalifa-Caspi V, Sagi A. Roadmap and Considerations for Genome Editing in a Non-Model Organism: Genetic Variations and Off-Target Profiling. Int J Mol Sci 2024; 25:12530. [PMID: 39684244 DOI: 10.3390/ijms252312530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 11/14/2024] [Accepted: 11/18/2024] [Indexed: 12/18/2024] Open
Abstract
The CRISPR/Cas genome editing approach in non-model organisms poses challenges that remain to be resolved. Here, we demonstrated a generalized roadmap for a de novo genome annotation approach applied to the non-model organism Macrobrachium rosenbergii. We also addressed the typical genome editing challenges arising from genetic variations, such as a high frequency of single nucleotide polymorphisms, differences in sex chromosomes, and repetitive sequences that can lead to off-target events. For the genome editing of M. rosenbergii, our laboratory recently adapted the CRISPR/Cas genome editing approach to embryos and the embryonic primary cell culture. In this continuation study, an annotation pipeline was trained to predict the gene models by leveraging the available genomic, transcriptomic, and proteomic data, and enabling accurate gene prediction and guide design for knock-outs. A next-generation sequencing analysis demonstrated a high frequency of genetic variations in genes on both autosomal and sex chromosomes, which have been shown to affect the accuracy of editing analyses. To enable future applications based on the CRISPR/Cas tool in non-model organisms, we also verified the reliability of editing efficiency and tracked off-target frequencies. Despite the lack of comprehensive information on non-model organisms, this study provides an example of the feasibility of selecting and editing specific genes with a high degree of certainty.
Collapse
Affiliation(s)
- Hanin Wattad
- Department of Life Sciences, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
| | - Jonathan Molcho
- Department of Life Sciences, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
| | - Rivka Manor
- Department of Life Sciences, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
| | - Simy Weil
- Department of Life Sciences, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
| | - Eliahu D Aflalo
- Department of Life Sciences, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
- Department of Life Sciences, Achva Academic College, Arugot 7980400, Israel
| | - Vered Chalifa-Caspi
- Bioinformatics Core Facility, Ilse Katz Institute for Nanoscale Science & Technology, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Amir Sagi
- Department of Life Sciences, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva 8410501, Israel
| |
Collapse
|
9
|
Özden F, Minary P. Learning to quantify uncertainty in off-target activity for CRISPR guide RNAs. Nucleic Acids Res 2024; 52:e87. [PMID: 39275984 PMCID: PMC11472043 DOI: 10.1093/nar/gkae759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 08/07/2024] [Accepted: 08/23/2024] [Indexed: 09/16/2024] Open
Abstract
CRISPR-based genome editing technologies have revolutionised the field of molecular biology, offering unprecedented opportunities for precise genetic manipulation. However, off-target effects remain a significant challenge, potentially leading to unintended consequences and limiting the applicability of CRISPR-based genome editing technologies in clinical settings. Current literature predominantly focuses on point predictions for off-target activity, which may not fully capture the range of possible outcomes and associated risks. Here, we present crispAI, a neural network architecture-based approach for predicting uncertainty estimates for off-target cleavage activity, providing a more comprehensive risk assessment and facilitating improved decision-making in single guide RNA (sgRNA) design. Our approach makes use of the count noise model Zero Inflated Negative Binomial (ZINB) to model the uncertainty in the off-target cleavage activity data. In addition, we present the first-of-its-kind genome-wide sgRNA efficiency score, crispAI-aggregate, enabling prioritization among sgRNAs with similar point aggregate predictions by providing richer information compared to existing aggregate scores. We show that uncertainty estimates of our approach are calibrated and its predictive performance is superior to the state-of-the-art in silico off-target cleavage activity prediction methods. The tool and the trained models are available at https://github.com/furkanozdenn/crispr-offtarget-uncertainty.
Collapse
Affiliation(s)
- Furkan Özden
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK
| | - Peter Minary
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK
| |
Collapse
|
10
|
Yang Y, Zheng Y, Zou Q, Li J, Feng H. Overcoming CRISPR-Cas9 off-target prediction hurdles: A novel approach with ESB rebalancing strategy and CRISPR-MCA model. PLoS Comput Biol 2024; 20:e1012340. [PMID: 39226304 PMCID: PMC11398643 DOI: 10.1371/journal.pcbi.1012340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 09/13/2024] [Accepted: 07/19/2024] [Indexed: 09/05/2024] Open
Abstract
The off-target activities within the CRISPR-Cas9 system remains a formidable barrier to its broader application and development. Recent advancements have highlighted the potential of deep learning models in predicting these off-target effects, yet they encounter significant hurdles including imbalances within datasets and the intricacies associated with encoding schemes and model architectures. To surmount these challenges, our study innovatively introduces an Efficiency and Specificity-Based (ESB) class rebalancing strategy, specifically devised for datasets featuring mismatches-only off-target instances, marking a pioneering approach in this realm. Furthermore, through a meticulous evaluation of various One-hot encoding schemes alongside numerous hybrid neural network models, we discern that encoding and models of moderate complexity ideally balance performance and efficiency. On this foundation, we advance a novel hybrid model, the CRISPR-MCA, which capitalizes on multi-feature extraction to enhance predictive accuracy. The empirical results affirm that the ESB class rebalancing strategy surpasses five conventional methods in addressing extreme dataset imbalances, demonstrating superior efficacy and broader applicability across diverse models. Notably, the CRISPR-MCA model excels in off-target effect prediction across four distinct mismatches-only datasets and significantly outperforms contemporary state-of-the-art models in datasets comprising both mismatches and indels. In summation, the CRISPR-MCA model, coupled with the ESB rebalancing strategy, offers profound insights and a robust framework for future explorations in this field.
Collapse
Affiliation(s)
- Yanpeng Yang
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou, China
| | - Yanyi Zheng
- College of Landscape Architecture, Beijing Forestry University, Beijing, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Jian Li
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou, China
| | - Hailin Feng
- School of Mathematics and Computer science, Zhejiang A&F University, Hangzhou, China
| |
Collapse
|
11
|
Huang B, Guo L, Yin H, Wu Y, Zeng Z, Xu S, Lou Y, Ai Z, Zhang W, Kan X, Yu Q, Du S, Li C, Wu L, Huang X, Wang S, Wang X. Deep learning enhancing guide RNA design for CRISPR/Cas12a-based diagnostics. IMETA 2024; 3:e214. [PMID: 39135699 PMCID: PMC11316927 DOI: 10.1002/imt2.214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 08/15/2024]
Abstract
Rapid and accurate diagnostic tests are fundamental for improving patient outcomes and combating infectious diseases. The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) Cas12a-based detection system has emerged as a promising solution for on-site nucleic acid testing. Nonetheless, the effective design of CRISPR RNA (crRNA) for Cas12a-based detection remains challenging and time-consuming. In this study, we propose an enhanced crRNA design system with deep learning for Cas12a-mediated diagnostics, referred to as EasyDesign. This system employs an optimized convolutional neural network (CNN) prediction model, trained on a comprehensive data set comprising 11,496 experimentally validated Cas12a-based detection cases, encompassing a wide spectrum of prevalent pathogens, achieving Spearman's ρ = 0.812. We further assessed the model performance in crRNA design for four pathogens not included in the training data: Monkeypox Virus, Enterovirus 71, Coxsackievirus A16, and Listeria monocytogenes. The results demonstrated superior prediction performance compared to the traditional experiment screening. Furthermore, we have developed an interactive web server (https://crispr.zhejianglab.com/) that integrates EasyDesign with recombinase polymerase amplification (RPA) primer design, enhancing user accessibility. Through this web-based platform, we successfully designed optimal Cas12a crRNAs for six human papillomavirus (HPV) subtypes. Remarkably, all the top five predicted crRNAs for each HPV subtype exhibited robust fluorescent signals in CRISPR assays, thereby suggesting that the platform could effectively facilitate clinical sample testing. In conclusion, EasyDesign offers a rapid and reliable solution for crRNA design in Cas12a-based detection, which could serve as a valuable tool for clinical diagnostics and research applications.
Collapse
Affiliation(s)
| | | | | | - Yue Wu
- Zhejiang LabHangzhouChina
| | | | | | - Yufeng Lou
- Department of Laboratory Medicine, The First Affiliated HospitalZhejiang University School of MedicineHangzhouChina
- Key Laboratory of Clinical In Vitro Diagnostic Techniques of Zhejiang ProvinceHangzhouChina
- Institute of Laboratory MedicineZhejiang UniversityHangzhouChina
| | | | | | | | | | | | - Chao Li
- Department of Applied Mathematics and Theoretical PhysicsUniversity of CambridgeCambridgeUK
- School of Medicine, School of Science and EngineeringUniversity of Dundee, NethergateDundeeUK
| | - Lina Wu
- School of Food Science and Pharmaceutical EngineeringNanjing Normal UniversityNanjingChina
| | | | | | - Xinjie Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at ShenzhenChinese Academy of Agricultural SciencesShenzhenChina
| |
Collapse
|
12
|
Guan Z, Jiang Z. A systematic method for solving data imbalance in CRISPR off-target prediction tasks. Comput Biol Med 2024; 178:108781. [PMID: 38936075 DOI: 10.1016/j.compbiomed.2024.108781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 06/05/2024] [Accepted: 06/15/2024] [Indexed: 06/29/2024]
Abstract
Accurately identifying potential off-target sites in the CRISPR/Cas9 system is crucial for improving the efficiency and safety of editing. However, the imbalance of available off-target datasets has posed a major obstacle in enhancing prediction performance. Despite several prediction models have been developed to address this issue, there remains a lack of systematic research on handling data imbalance in off-target prediction. This article systematically investigates the data imbalance issue in off-target datasets and explores numerous methods to process data imbalance from a novel perspective. First, we highlight the impact of the imbalance problem on off-target prediction tasks by determining the imbalance ratios present in these datasets. Then, we provide a comprehensive review of various sampling techniques and cost-sensitive methods to mitigate class imbalance in off-target datasets. Finally, systematic experiments are conducted on several state-of-the-art prediction models to illustrate the impact of applying data imbalance solutions. The results show that class imbalance processing methods significantly improve the off-target prediction capabilities of the models across multiple testing datasets. The code and datasets used in this study are available at https://github.com/gzrgzx/CRISPR_Data_Imbalance.
Collapse
Affiliation(s)
- Zengrui Guan
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Zhenran Jiang
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
13
|
Sun J, Guo J, Liu J. CRISPR-M: Predicting sgRNA off-target effect using a multi-view deep learning network. PLoS Comput Biol 2024; 20:e1011972. [PMID: 38483980 DOI: 10.1371/journal.pcbi.1011972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 03/26/2024] [Accepted: 03/05/2024] [Indexed: 03/27/2024] Open
Abstract
Using the CRISPR-Cas9 system to perform base substitutions at the target site is a typical technique for genome editing with the potential for applications in gene therapy and agricultural productivity. When the CRISPR-Cas9 system uses guide RNA to direct the Cas9 endonuclease to the target site, it may misdirect it to a potential off-target site, resulting in an unintended genome editing. Although several computational methods have been proposed to predict off-target effects, there is still room for improvement in the off-target effect prediction capability. In this paper, we present an effective approach called CRISPR-M with a new encoding scheme and a novel multi-view deep learning model to predict the sgRNA off-target effects for target sites containing indels and mismatches. CRISPR-M takes advantage of convolutional neural networks and bidirectional long short-term memory recurrent neural networks to construct a three-branch network towards multi-views. Compared with existing methods, CRISPR-M demonstrates significant performance advantages running on real-world datasets. Furthermore, experimental analysis of CRISPR-M under multiple metrics reveals its capability to extract features and validates its superiority on sgRNA off-target effect predictions.
Collapse
Affiliation(s)
- Jialiang Sun
- College of Computer Science, Nankai University, Tianjin, China
| | - Jun Guo
- College of Software, Northeastern University, Shenyang, China
| | - Jian Liu
- College of Computer Science, Nankai University, Tianjin, China
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin, China
| |
Collapse
|
14
|
Luo Y, Chen Y, Xie H, Zhu W, Zhang G. Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT. Comput Biol Med 2024; 169:107932. [PMID: 38199209 DOI: 10.1016/j.compbiomed.2024.107932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/25/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024]
Abstract
Off-target effects of CRISPR/Cas9 can lead to suboptimal genome editing outcomes. Numerous deep learning-based approaches have achieved excellent performance for off-target prediction; however, few can predict the off-target activities with both mismatches and indels between single guide RNA (sgRNA) and target DNA sequence pair. In addition, data imbalance is a common pitfall for off-target prediction. Moreover, due to the complexity of genomic contexts, generating an interpretable model also remains challenged. To address these issues, firstly we developed a BERT-based model called CRISPR-BERT for enhancing the prediction of off-target activities with both mismatches and indels. Secondly, we proposed an adaptive batch-wise class balancing strategy to combat the noise exists in imbalanced off-target data. Finally, we applied a visualization approach for investigating the generalizable nucleotide position-dependent patterns of sgRNA-DNA pair for off-target activity. In our comprehensive comparison to existing methods on five mismatches-only datasets and two mismatches-and-indels datasets, CRISPR-BERT achieved the best performance in terms of AUROC and PRAUC. Besides, the visualization analysis demonstrated how implicit knowledge learned by CRISPR-BERT facilitates off-target prediction, which shows potential in model interpretability. Collectively, CRISPR-BERT provides an accurate and interpretable framework for off-target prediction, further contributes to sgRNA optimization in practical use for improved target specificity in CRISPR/Cas9 genome editing. The source code is available at https://github.com/BrokenStringx/CRISPR-BERT.
Collapse
Affiliation(s)
- Ye Luo
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Yaowen Chen
- College of Engineering, Shantou University, Shantou, 515063, China
| | - HuanZeng Xie
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Wentao Zhu
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, 515063, China.
| |
Collapse
|
15
|
Toufikuzzaman M, Hassan Samee MA, Sohel Rahman M. CRISPR-DIPOFF: an interpretable deep learning approach for CRISPR Cas-9 off-target prediction. Brief Bioinform 2024; 25:bbad530. [PMID: 38388680 PMCID: PMC10883906 DOI: 10.1093/bib/bbad530] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/14/2023] [Accepted: 12/19/2023] [Indexed: 02/24/2024] Open
Abstract
CRISPR Cas-9 is a groundbreaking genome-editing tool that harnesses bacterial defense systems to alter DNA sequences accurately. This innovative technology holds vast promise in multiple domains like biotechnology, agriculture and medicine. However, such power does not come without its own peril, and one such issue is the potential for unintended modifications (Off-Target), which highlights the need for accurate prediction and mitigation strategies. Though previous studies have demonstrated improvement in Off-Target prediction capability with the application of deep learning, they often struggle with the precision-recall trade-off, limiting their effectiveness and do not provide proper interpretation of the complex decision-making process of their models. To address these limitations, we have thoroughly explored deep learning networks, particularly the recurrent neural network based models, leveraging their established success in handling sequence data. Furthermore, we have employed genetic algorithm for hyperparameter tuning to optimize these models' performance. The results from our experiments demonstrate significant performance improvement compared with the current state-of-the-art in Off-Target prediction, highlighting the efficacy of our approach. Furthermore, leveraging the power of the integrated gradient method, we make an effort to interpret our models resulting in a detailed analysis and understanding of the underlying factors that contribute to Off-Target predictions, in particular the presence of two sub-regions in the seed region of single guide RNA which extends the established biological hypothesis of Off-Target effects. To the best of our knowledge, our model can be considered as the first model combining high efficacy, interpretability and a desirable balance between precision and recall.
Collapse
Affiliation(s)
- Md Toufikuzzaman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Md Abul Hassan Samee
- Department of Integrative Physiology, Baylor College of Medicine, Houston, TX 77030, USA
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| |
Collapse
|
16
|
Dixit S, Kumar A, Srinivasan K, Vincent PMDR, Ramu Krishnan N. Advancing genome editing with artificial intelligence: opportunities, challenges, and future directions. Front Bioeng Biotechnol 2024; 11:1335901. [PMID: 38260726 PMCID: PMC10800897 DOI: 10.3389/fbioe.2023.1335901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
Clustered regularly interspaced short palindromic repeat (CRISPR)-based genome editing (GED) technologies have unlocked exciting possibilities for understanding genes and improving medical treatments. On the other hand, Artificial intelligence (AI) helps genome editing achieve more precision, efficiency, and affordability in tackling various diseases, like Sickle cell anemia or Thalassemia. AI models have been in use for designing guide RNAs (gRNAs) for CRISPR-Cas systems. Tools like DeepCRISPR, CRISTA, and DeepHF have the capability to predict optimal guide RNAs (gRNAs) for a specified target sequence. These predictions take into account multiple factors, including genomic context, Cas protein type, desired mutation type, on-target/off-target scores, potential off-target sites, and the potential impacts of genome editing on gene function and cell phenotype. These models aid in optimizing different genome editing technologies, such as base, prime, and epigenome editing, which are advanced techniques to introduce precise and programmable changes to DNA sequences without relying on the homology-directed repair pathway or donor DNA templates. Furthermore, AI, in collaboration with genome editing and precision medicine, enables personalized treatments based on genetic profiles. AI analyzes patients' genomic data to identify mutations, variations, and biomarkers associated with different diseases like Cancer, Diabetes, Alzheimer's, etc. However, several challenges persist, including high costs, off-target editing, suitable delivery methods for CRISPR cargoes, improving editing efficiency, and ensuring safety in clinical applications. This review explores AI's contribution to improving CRISPR-based genome editing technologies and addresses existing challenges. It also discusses potential areas for future research in AI-driven CRISPR-based genome editing technologies. The integration of AI and genome editing opens up new possibilities for genetics, biomedicine, and healthcare, with significant implications for human health.
Collapse
Affiliation(s)
- Shriniket Dixit
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Anant Kumar
- School of Bioscience and Technology, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - P. M. Durai Raj Vincent
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| | - Nadesh Ramu Krishnan
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
17
|
Störtz F, Mak JK, Minary P. piCRISPR: Physically informed deep learning models for CRISPR/Cas9 off-target cleavage prediction. ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES 2023; 3:None. [PMID: 38047242 PMCID: PMC10316064 DOI: 10.1016/j.ailsci.2023.100075] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 04/02/2023] [Accepted: 04/30/2023] [Indexed: 12/05/2023]
Abstract
CRISPR/Cas programmable nuclease systems have become ubiquitous in the field of gene editing. With progressing development, applications in in vivo therapeutic gene editing are increasingly within reach, yet limited by possible adverse side effects from unwanted edits. Recent years have thus seen continuous development of off-target prediction algorithms trained on in vitro cleavage assay data gained from immortalised cell lines. It has been shown that in contrast to experimental epigenetic features, computed physically informed features are so far underutilised despite bearing considerably larger correlation with cleavage activity. Here, we implement state-of-the-art deep learning algorithms and feature encodings for off-target prediction with emphasis on physically informed features that capture the biological environment of the cleavage site, hence terming our approach piCRISPR. Features were gained from the large, diverse crisprSQL off-target cleavage dataset. We find that our best-performing models highlight the importance of sequence context and chromatin accessibility for cleavage prediction and compare favourably with literature standard prediction performance. We further show that our novel, environmentally sensitive features are crucial to accurate prediction on sequence-identical locus pairs, making them highly relevant for clinical guide design. The source code and trained models can be found ready to use at github.com/florianst/picrispr.
Collapse
Affiliation(s)
- Florian Störtz
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| | - Jeffrey K. Mak
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| | - Peter Minary
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| |
Collapse
|
18
|
Zhang Z, Lamson AR, Shelley M, Troyanskaya O. Interpretable neural architecture search and transfer learning for understanding CRISPR-Cas9 off-target enzymatic reactions. NATURE COMPUTATIONAL SCIENCE 2023; 3:1056-1066. [PMID: 38177723 DOI: 10.1038/s43588-023-00569-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/08/2023] [Indexed: 01/06/2024]
Abstract
Finely tuned enzymatic pathways control cellular processes, and their dysregulation can lead to disease. Developing predictive and interpretable models for these pathways is challenging because of the complexity of the pathways and of the cellular and genomic contexts. Here we introduce Elektrum, a deep learning framework that addresses these challenges with data-driven and biophysically interpretable models for determining the kinetics of biochemical systems. First, it uses in vitro kinetic assays to rapidly hypothesize an ensemble of high-quality kinetically interpretable neural networks (KINNs) that predict reaction rates. It then employs a transfer learning step, where the KINNs are inserted as intermediary layers into deeper convolutional neural networks, fine-tuning the predictions for reaction-dependent in vivo outcomes. We apply Elektrum to predict CRISPR-Cas9 off-target editing probabilities and demonstrate that Elektrum achieves improved performance, regularizes neural network architectures and maintains physical interpretability.
Collapse
Affiliation(s)
- Zijun Zhang
- Division of Artificial Intelligence in Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Adam R Lamson
- Center for Computational Biology, Flatiron Institute, New York City, NY, USA
| | - Michael Shelley
- Center for Computational Biology, Flatiron Institute, New York City, NY, USA.
- Courant Institute of Mathematical Sciences, New York University, New York City, NY, USA.
| | - Olga Troyanskaya
- Center for Computational Biology, Flatiron Institute, New York City, NY, USA.
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
19
|
Santorsola M, Lescai F. The promise of explainable deep learning for omics data analysis: Adding new discovery tools to AI. N Biotechnol 2023; 77:1-11. [PMID: 37329982 DOI: 10.1016/j.nbt.2023.06.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/01/2023] [Accepted: 06/14/2023] [Indexed: 06/19/2023]
Abstract
Deep learning has already revolutionised the way a wide range of data is processed in many areas of daily life. The ability to learn abstractions and relationships from heterogeneous data has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way these data are analyzed, explainable deep learning is emerging as an additional tool with the potential to change the way biological data is interpreted. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights into the input data, thus adding an element of discovery to these already powerful resources. In this review, we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.
Collapse
Affiliation(s)
| | - Francesco Lescai
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy.
| |
Collapse
|
20
|
Zhang Z, Lamson AR, Shelley M, Troyanskaya O. Interpretable neural architecture search and transfer learning for understanding CRISPR/Cas9 off-target enzymatic reactions. ARXIV 2023:arXiv:2305.11917v2. [PMID: 37808087 PMCID: PMC10557798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Finely-tuned enzymatic pathways control cellular processes, and their dysregulation can lead to disease. Creating predictive and interpretable models for these pathways is challenging because of the complexity of the pathways and of the cellular and genomic contexts. Here we introduce Elektrum, a deep learning framework which addresses these challenges with data-driven and biophysically interpretable models for determining the kinetics of biochemical systems. First, it uses in vitro kinetic assays to rapidly hypothesize an ensemble of high-quality Kinetically Interpretable Neural Networks (KINNs) that predict reaction rates. It then employs a novel transfer learning step, where the KINNs are inserted as intermediary layers into deeper convolutional neural networks, fine-tuning the predictions for reaction-dependent in vivo outcomes. Elektrum makes effective use of the limited, but clean in vitro data and the complex, yet plentiful in vivo data that captures cellular context. We apply Elektrum to predict CRISPR-Cas9 off-target editing probabilities and demonstrate that Elektrum achieves state-of-the-art performance, regularizes neural network architectures, and maintains physical interpretability.
Collapse
Affiliation(s)
- Zijun Zhang
- Division of Artificial Intelligence in Medicine, Cedars-Sinai Medical Center, 116 N. Robertson Blvd, Los Angeles, 90048, CA, USA
| | - Adam R. Lamson
- Center for Computational Biology, Flatiron Institute, 162 5th Ave, New York City, 10010, NY, USA
| | - Michael Shelley
- Center for Computational Biology, Flatiron Institute, 162 5th Ave, New York City, 10010, NY, USA
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York City, 10012, NY, USA
| | - Olga Troyanskaya
- Center for Computational Biology, Flatiron Institute, 162 5th Ave, New York City, 10010, NY, USA
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory South Drive, Princeton, 08544, NJ, USA
| |
Collapse
|
21
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
22
|
Choi SR, Lee M. Transformer Architecture and Attention Mechanisms in Genome Data Analysis: A Comprehensive Review. BIOLOGY 2023; 12:1033. [PMID: 37508462 PMCID: PMC10376273 DOI: 10.3390/biology12071033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/18/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023]
Abstract
The emergence and rapid development of deep learning, specifically transformer-based architectures and attention mechanisms, have had transformative implications across several domains, including bioinformatics and genome data analysis. The analogous nature of genome sequences to language texts has enabled the application of techniques that have exhibited success in fields ranging from natural language processing to genomic data. This review provides a comprehensive analysis of the most recent advancements in the application of transformer architectures and attention mechanisms to genome and transcriptome data. The focus of this review is on the critical evaluation of these techniques, discussing their advantages and limitations in the context of genome data analysis. With the swift pace of development in deep learning methodologies, it becomes vital to continually assess and reflect on the current standing and future direction of the research. Therefore, this review aims to serve as a timely resource for both seasoned researchers and newcomers, offering a panoramic view of the recent advancements and elucidating the state-of-the-art applications in the field. Furthermore, this review paper serves to highlight potential areas of future investigation by critically evaluating studies from 2019 to 2023, thereby acting as a stepping-stone for further research endeavors.
Collapse
Affiliation(s)
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
| |
Collapse
|
23
|
Sherkatghanad Z, Abdar M, Charlier J, Makarenkov V. Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review. Brief Bioinform 2023; 24:bbad131. [PMID: 37080758 PMCID: PMC10199778 DOI: 10.1093/bib/bbad131] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 03/07/2023] [Accepted: 03/13/2023] [Indexed: 04/22/2023] Open
Abstract
CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.
Collapse
Affiliation(s)
- Zeinab Sherkatghanad
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 3216, Geelong, VIC, Australia
| | - Jeremy Charlier
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Vladimir Makarenkov
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| |
Collapse
|
24
|
Wan Y, Jiang Z. TransCrispr: Transformer Based Hybrid Model for Predicting CRISPR/Cas9 Single Guide RNA Cleavage Efficiency. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1518-1528. [PMID: 36006888 DOI: 10.1109/tcbb.2022.3201631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
CRISPR/Cas9 is a widely used genome editing tool for site-directed modification of deoxyribonucleic acid (DNA) nucleotide sequences. However, how to accurately predict and evaluate the on- and off-target effects of single guide RNA (sgRNA) is one of the key problems for CRISPR/Cas9 system. Using computational methods to obtain high cell-specific sensitivity and specificity is a prerequisite for the optimal design of sgRNAs. Inspired by the work of predecessors, we found that sgRNA on-target knockout efficacy was not only related to the original sequence but also affected by important biological features. Hence, we introduce a novel approach called TransCrispr, which integrates Transformer and convolutional neural network (CNN) architecture to predict sgRNA knockout efficacy. Firstly, we encode the sequence data and send the transformed sgRNA sequence, positional information, and biological features into the network as input. Then, the convolutional neural network will automatically learn an appropriate feature representation for the sgRNA sequence and combine it with the positional information for self-attention learning of the Transformer. Finally, a regression score is generated by predicting biological features. Experiments on seven public datasets illustrate that TransCrispr outperforms state-of-the-art methods in terms of prediction accuracy and generalization ability.
Collapse
|
25
|
Comprehensive computational analysis of epigenetic descriptors affecting CRISPR-Cas9 off-target activity. BMC Genomics 2022; 23:805. [PMID: 36474180 PMCID: PMC9724382 DOI: 10.1186/s12864-022-09012-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/17/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND A common issue in CRISPR-Cas9 genome editing is off-target activity, which prevents the widespread use of CRISPR-Cas9 in medical applications. Among other factors, primary chromatin structure and epigenetics may influence off-target activity. METHODS In this work, we utilize crisprSQL, an off-target database, to analyze the effect of 19 epigenetic descriptors on CRISPR-Cas9 off-target activity. Termed as 19 epigenetic features/scores, they consist of 6 experimental epigenetic and 13 computed nucleosome organization-related features. In terms of novel features, 15 of the epigenetic scores are newly considered. The 15 newly considered scores consist of 13 freshly computed nucleosome occupancy/positioning scores and 2 experimental features (MNase and DRIP). The other 4 existing scores are experimental features (CTCF, DNase I, H3K4me3, RRBS) commonly used in deep learning models for off-target activity prediction. For data curation, MNase was aggregated from existing experimental nucleosome occupancy data. Based on the sequence context information available in crisprSQL, we also computed nucleosome occupancy/positioning scores for off-target sites. RESULTS To investigate the relationship between the 19 epigenetic features and off-target activity, we first conducted Spearman and Pearson correlation analysis. Such analysis shows that some computed scores derived from training-based models and training-free algorithms outperform all experimental epigenetic features. Next, we evaluated the contribution of all epigenetic features in two successful machine/deep learning models which predict off-target activity. We found that some computed scores, unlike all 6 experimental features, significantly contribute to the predictions of both models. As a practical research contribution, we make the off-target dataset containing all 19 epigenetic features available to the research community. CONCLUSIONS Our comprehensive computational analysis helps the CRISPR-Cas9 community better understand the relationship between epigenetic features and CRISPR-Cas9 off-target activity.
Collapse
|
26
|
Yang Q, Wu L, Meng J, Ma L, Zuo E, Sun Y. EpiCas-DL: Predicting sgRNA activity for CRISPR-mediated epigenome editing by deep learning. Comput Struct Biotechnol J 2022; 21:202-211. [PMID: 36582444 PMCID: PMC9763632 DOI: 10.1016/j.csbj.2022.11.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 11/15/2022] [Accepted: 11/15/2022] [Indexed: 11/21/2022] Open
Abstract
CRISPR-mediated epigenome editing enables gene expression regulation without changing the underlying DNA sequence, and thus has vast potential for basic research and gene therapy. Effective selection of a single guide RNA (sgRNA) with high on-target efficiency and specificity would facilitate the application of epigenome editing tools. Here we performed an extensive analysis of CRISPR-mediated epigenome editing tools on thousands of experimentally examined on-target sites and established EpiCas-DL, a deep learning framework to optimize sgRNA design for gene silencing or activation. EpiCas-DL achieves high accuracy in sgRNA activity prediction for targeted gene silencing or activation and outperforms other available in silico methods. In addition, EpiCas-DL also identifies both epigenetic and sequence features that affect sgRNA efficacy in gene silencing and activation, facilitating the application of epigenome editing for research and therapy. EpiCas-DL is available at http://www.sunlab.fun:3838/EpiCas-DL.
Collapse
Affiliation(s)
- Qianqian Yang
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Leilei Wu
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Juan Meng
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| | - Lei Ma
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Erwei Zuo
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yidi Sun
- Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
27
|
Vora DS, Verma Y, Sundar D. A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction. Biomolecules 2022; 12:1123. [PMID: 36009017 PMCID: PMC9405635 DOI: 10.3390/biom12081123] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/08/2022] [Accepted: 08/10/2022] [Indexed: 11/23/2022] Open
Abstract
The reprogrammable CRISPR/Cas9 genome editing tool's growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| | - Yugesh Verma
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
- Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
28
|
Deep learning prediction of chemical-induced dose-dependent and context-specific multiplex phenotype responses and its application to personalized alzheimer’s disease drug repurposing. PLoS Comput Biol 2022; 18:e1010367. [PMID: 35951653 PMCID: PMC9398009 DOI: 10.1371/journal.pcbi.1010367] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 08/23/2022] [Accepted: 07/08/2022] [Indexed: 11/19/2022] Open
Abstract
Predictive modeling of drug-induced gene expressions is a powerful tool for phenotype-based compound screening and drug repurposing. State-of-the-art machine learning methods use a small number of fixed cell lines as a surrogate for predicting actual expressions in a new cell type or tissue, although it is well known that drug responses depend on a cellular context. Thus, the existing approach has limitations when applied to personalized medicine, especially for many understudied diseases whose molecular profiles are dramatically different from those characterized in the training data. Besides the gene expression, dose-dependent cell viability is another important phenotype readout and is more informative than conventional summary statistics (e.g., IC50) for characterizing clinical drug efficacy and toxicity. However, few computational methods can reliably predict the dose-dependent cell viability. To address the challenges mentioned above, we designed a new deep learning model, MultiDCP, to predict cellular context-dependent gene expressions and cell viability on a specific dosage. The novelties of MultiDCP include a knowledge-driven gene expression profile transformer that enables context-specific phenotypic response predictions of novel cells or tissues, integration of multiple diverse labeled and unlabeled omics data, the joint training of the multiple prediction tasks, and a teacher-student training procedure that allows us to utilize unreliable data effectively. Comprehensive benchmark studies suggest that MultiDCP outperforms state-of-the-art methods with unseen cell lines that are dissimilar from the cell lines in the supervised training in terms of gene expressions. The predicted drug-induced gene expressions demonstrate a stronger predictive power than noisy experimental data for downstream tasks. Thus, MultiDCP is a useful tool for transcriptomics-based drug repurposing and compound screening that currently rely on noisy high-throughput experimental data. We applied MultiDCP to repurpose individualized drugs for Alzheimer’s disease in terms of efficacy and toxicity, suggesting that MultiDCP is a potentially powerful tool for personalized drug discovery. Conventional target-based compound screening that follows the one-drug-one-gene drug discovery paradigm has a low success rate in tackling multi-genic systemic diseases such as Alzheimer’s disease. A systems pharmacology strategy is needed to target gene regulatory networks. To enable systems pharmacology-oriented phenotypic screening, it is critical to utilize a mechanistic phenotype readout to link drug responses in a model system to drug toxicity and efficacy in an individual. Chemical-induced dose-dependent gene expression profiles provide critical information on drug mode of action and off-target effects and can identify drug candidates that reverse disease phenotypes. However, state-of-the-art machine learning methods for predicting chemical-induced gene expressions are all trained using data from a limited number of cancer cell lines and can only achieve suboptimal performance when applied to new cell types or patient samples. Here, we have developed a new deep learning framework to address this challenge and demonstrated its potential in personalized drug repurposing using Alzheimer’s disease as a case study.
Collapse
|
29
|
Xie J, Liu M, Zhou L. CRISPR-OTE: Prediction of CRISPR On-Target Efficiency Based on Multi-Dimensional Feature Fusion. Ing Rech Biomed 2022. [DOI: 10.1016/j.irbm.2022.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
30
|
Evaluation protocol for CRISPR/Cas9-mediated CD19 knockout GM24385 cells by flow cytometry and Sanger sequencing. Biotechniques 2022; 72:279-286. [PMID: 35703314 DOI: 10.2144/btn-2022-0015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Although several genome editing options are available, CRISPR/Cas9 is one of the most commonly used systems for protein and advanced therapies. There are some long-term data regarding genomic and phenotypic stability, however, information is sparse. Flow cytometry can offer a method to characterize these edited cells for longitudinal studies. The objective of this work is to describe a protocol for using flow cytometry to measure the edits from CRISPR/Cas9 on a well-characterized B-lymphoblast cell line, GM24385, with the goal of supporting safe and effective CRISPR/Cas9-engineered therapies.
Collapse
|
31
|
Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RAL, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13:1728. [PMID: 35365602 PMCID: PMC8976012 DOI: 10.1038/s41467-022-29268-7] [Citation(s) in RCA: 105] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 03/09/2022] [Indexed: 11/19/2022] Open
Abstract
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Amirali Aghazadeh
- Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA
| | - Michael G Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Richard Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | | | - Chen Dun
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cameron R Wolfe
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhi Yan
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vicky Yao
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
- Department of Bioengineering, Rice University, Houston, TX, USA.
| |
Collapse
|
32
|
Jo T, Nho K, Bice P, Saykin AJ. Deep learning-based identification of genetic variants: application to Alzheimer's disease classification. Brief Bioinform 2022; 23:bbac022. [PMID: 35183061 PMCID: PMC8921609 DOI: 10.1093/bib/bbac022] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/13/2022] [Accepted: 01/17/2022] [Indexed: 01/29/2023] Open
Abstract
Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning-based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.
Collapse
Affiliation(s)
- Taeho Jo
- Department of Radiology and Imaging Sciences, Center for Neuroimaging, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana University Network Science Institute, Bloomington, IN, USA
| | - Kwangsik Nho
- Department of Radiology and Imaging Sciences, Center for Neuroimaging, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana University Network Science Institute, Bloomington, IN, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Paula Bice
- Department of Radiology and Imaging Sciences, Center for Neuroimaging, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Center for Neuroimaging, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana Alzheimer’s Disease Research Center, Indiana University School of Medicine, Indianapolis, IN, USA
- Indiana University Network Science Institute, Bloomington, IN, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | | |
Collapse
|
33
|
Li B, Ai D, Liu X. CNN-XG: A Hybrid Framework for sgRNA On-Target Prediction. Biomolecules 2022; 12:409. [PMID: 35327601 PMCID: PMC8945678 DOI: 10.3390/biom12030409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/23/2022] [Accepted: 03/03/2022] [Indexed: 02/04/2023] Open
Abstract
As the third generation gene editing technology, Crispr/Cas9 has a wide range of applications. The success of Crispr depends on the editing of the target gene via a functional complex of sgRNA and Cas9 proteins. Therefore, highly specific and high on-target cleavage efficiency sgRNA can make this process more accurate and efficient. Although there are already many sophisticated machine learning or deep learning models to predict the on-target cleavage efficiency of sgRNA, prediction accuracy remains to be improved. XGBoost is good at classification as the ensemble model could overcome the deficiency of a single classifier to classify, and we would like to improve the prediction efficiency for sgRNA on-target activity by introducing XGBoost into the model. We present a novel machine learning framework which combines a convolutional neural network (CNN) and XGBoost to predict sgRNA on-target knockout efficacy. Our framework, called CNN-XG, is mainly composed of two parts: a feature extractor CNN is used to automatically extract features from sequences and predictor XGBoost is applied to predict features extracted after convolution. Experiments on commonly used datasets show that CNN-XG performed significantly better than other existing frameworks in the predicted classification mode.
Collapse
Affiliation(s)
- Bohao Li
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| | - Dongmei Ai
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
- Basic Experimental Center of Natural Science, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiuqin Liu
- School of Mathematics and Physics, University of Science and Technology Beijing, Beijing 100083, China; (B.L.); (D.A.)
| |
Collapse
|
34
|
A systematic mapping study on machine learning techniques for the prediction of CRISPR/Cas9 sgRNA target cleavage. Comput Struct Biotechnol J 2022; 20:5813-5823. [PMID: 36382194 PMCID: PMC9630617 DOI: 10.1016/j.csbj.2022.10.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/21/2022] [Accepted: 10/08/2022] [Indexed: 11/30/2022] Open
Abstract
CRISPR/Cas9 technology has greatly accelerated genome engineering research. The CRISPR/Cas9 complex, a bacterial immune response system, is widely adopted for RNA-driven targeted genome editing. The systematic mapping study presented in this paper examines the literature on machine learning (ML) techniques employed in the prediction of CRISPR/Cas9 sgRNA on/off-target cleavage, focusing on improving support in sgRNA design activities and identifying areas currently being researched. This area of research has greatly expanded recently, and we found it appropriate to work on a Systematic Mapping Study (SMS), an investigation that has proven to be an effective secondary study method. Unlike a classic review, in an SMS, no comparison of methods or results is made, while this task can instead be the subject of a systematic literature review that chooses one theme among those highlighted in this SMS. The study is illustrated in this paper. To the best of the authors' knowledge, no other SMS studies have been published on this topic. Fifty-seven papers published in the period 2017–2022 (April, 30) were analyzed. This study reveals that the most widely used ML model is the convolutional neural network (CNN), followed by the feedforward neural network (FNN), while the use of other models is marginal. Other interesting information has emerged, such as the wide availability of both open code and platforms dedicated to supporting the activity of researchers or the fact that there is a clear prevalence of public funds that finance research on this topic.
Collapse
|
35
|
Zhang ZR, Jiang ZR. Effective use of sequence information to predict CRISPR-Cas9 off-target. Comput Struct Biotechnol J 2022; 20:650-661. [PMID: 35140885 PMCID: PMC8804193 DOI: 10.1016/j.csbj.2022.01.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 01/05/2022] [Accepted: 01/08/2022] [Indexed: 12/05/2022] Open
Abstract
The CRISPR/Cas9 gene-editing system is the third-generation gene-editing technology that has been widely used in biomedical applications. However, off-target effects occurring CRISPR/Cas9 system has been a challenging problem it faces in practical applications. Although many predictive models have been developed to predict off-target activities, current models do not effectively use sequence pair information. There is still room for improved accuracy. This study aims to effectively use sequence pair information to improve the model's performance for predicting off-target activities. We propose a new coding scheme for coding sequence pairs and design a new model called CRISPR-IP for predicting off-target activity. Our coding scheme distinguishes regions with different functions in the sequence pairs through the function channel. Moreover, it distinguishes between bases and base pairs using type channels, effectively representing the sequence pair information. The CRISPR-IP model is based on CNN, BiLSTM, and the attention layer to learn features of sequence pairs. We performed performance verification on two data sets and found that our coding scheme can represent sequence pair information effectively, and the CRISPR-IP model performance is better than others. Data and source codes are available at https://github.com/BioinfoVirgo/CRISPR-IP.
Collapse
|
36
|
Xiao LM, Wan YQ, Jiang ZR. AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity. BMC Bioinformatics 2021; 22:589. [PMID: 34903170 PMCID: PMC8667445 DOI: 10.1186/s12859-021-04509-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 12/01/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND More and more Cas9 variants with higher specificity are developed to avoid the off-target effect, which brings a significant volume of experimental data. Conventional machine learning performs poorly on these datasets, while the methods based on deep learning often lack interpretability, which makes researchers have to trade-off accuracy and interpretability. It is necessary to develop a method that can not only match deep learning-based methods in performance but also with good interpretability that can be comparable to conventional machine learning methods. RESULTS To overcome these problems, we propose an intrinsically interpretable method called AttCRISPR based on deep learning to predict the on-target activity. The advantage of AttCRISPR lies in using the ensemble learning strategy to stack available encoding-based methods and embedding-based methods with strong interpretability. Comparison with the state-of-the-art methods using WT-SpCas9, eSpCas9(1.1), SpCas9-HF1 datasets, AttCRISPR can achieve an average Spearman value of 0.872, 0.867, 0.867, respectively on several public datasets, which is superior to these methods. Furthermore, benefits from two attention modules-one spatial and one temporal, AttCRISPR has good interpretability. Through these modules, we can understand the decisions made by AttCRISPR at both global and local levels without other post hoc explanations techniques. CONCLUSION With the trained models, we reveal the preference for each position-dependent nucleotide on the sgRNA (short guide RNA) sequence in each dataset at a global level. And at a local level, we prove that the interpretability of AttCRISPR can be used to guide the researchers to design sgRNA with higher activity.
Collapse
Affiliation(s)
- Li-Ming Xiao
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Yun-Qi Wan
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Zhen-Ran Jiang
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
37
|
Niu R, Peng J, Zhang Z, Shang X. R-CRISPR: A Deep Learning Network to Predict Off-Target Activities with Mismatch, Insertion and Deletion in CRISPR-Cas9 System. Genes (Basel) 2021; 12:1878. [PMID: 34946828 PMCID: PMC8702036 DOI: 10.3390/genes12121878] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 11/19/2021] [Accepted: 11/22/2021] [Indexed: 12/26/2022] Open
Abstract
The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated protein 9 (Cas9) system is a groundbreaking gene-editing tool, which has been widely adopted in biomedical research. However, the guide RNAs in CRISPR-Cas9 system may induce unwanted off-target activities and further affect the practical application of the technique. Most existing in silico prediction methods that focused on off-target activities possess limited predictive precision and remain to be improved. Hence, it is necessary to propose a new in silico prediction method to address this problem. In this work, a deep learning framework named R-CRISPR is presented, which devises an encoding scheme to encode gRNA-target sequences into binary matrices, a convolutional neural network as feature extractor, and a recurrent neural network to predict off-target activities with mismatch, insertion, or deletion. It is demonstrated that R-CRISPR surpasses six mainstream prediction methods with a significant improvement on mismatch-only datasets verified by GUIDE-seq. Compared with the state-of-art prediction methods, R-CRISPR also achieves competitive performance on datasets with mismatch, insertion, and deletion. Furthermore, experiments show that data concatenate could influence the quality of training data, and investigate the optimal combination of datasets.
Collapse
Affiliation(s)
| | | | | | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (R.N.); (J.P.); (Z.Z.)
| |
Collapse
|
38
|
Bao M, Chen Q, Xu Z, Jensen EC, Liu C, Waitkus JT, Yuan X, He Q, Qin P, Du K. Challenges and Opportunities for Clustered Regularly Interspaced Short Palindromic Repeats Based Molecular Biosensing. ACS Sens 2021; 6:2497-2522. [PMID: 34143608 DOI: 10.1021/acssensors.1c00530] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Clustered regularly interspaced short palindromic repeats, CRISPR, has recently emerged as a powerful molecular biosensing tool for nucleic acids and other biomarkers due to its unique properties such as collateral cleavage nature, room temperature reaction conditions, and high target-recognition specificity. Numerous platforms have been developed to leverage the CRISPR assay for ultrasensitive biosensing applications. However, to be considered as a new gold standard, several key challenges for CRISPR molecular biosensing must be addressed. In this paper, we briefly review the history of biosensors, followed by the current status of nucleic acid-based detection methods. We then discuss the current challenges pertaining to CRISPR-based nucleic acid detection, followed by the recent breakthroughs addressing these challenges. We focus upon future advancements required to enable rapid, simple, sensitive, specific, multiplexed, amplification-free, and shelf-stable CRISPR-based molecular biosensors.
Collapse
Affiliation(s)
- Mengdi Bao
- Department of Mechanical Engineering, Rochester Institute of Technology, Rochester, New York 14623, United States
| | - Qun Chen
- Center of Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong Province 518055, China
| | - Zhiheng Xu
- Department of Mechanical Engineering, Rochester Institute of Technology, Rochester, New York 14623, United States
| | - Erik C. Jensen
- HJ Science & Technology Inc., San Leandro, California 94710, United States
| | - Changyue Liu
- Center of Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong Province 518055, China
| | - Jacob T. Waitkus
- Department of Mechanical Engineering, Rochester Institute of Technology, Rochester, New York 14623, United States
| | - Xi Yuan
- Center of Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong Province 518055, China
| | - Qian He
- Center of Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong Province 518055, China
| | - Peiwu Qin
- Center of Precision Medicine and Healthcare, Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong Province 518055, China
| | - Ke Du
- Department of Mechanical Engineering, Rochester Institute of Technology, Rochester, New York 14623, United States
- Department of Microsystems Engineering, Rochester Institute of Technology, Rochester, New York 14623, United States
- School of Chemistry and Materials Science, Rochester Institute of Technology, Rochester, New York 14623, United States
| |
Collapse
|
39
|
Zhu X, Zhang Y, Yang X, Hao C, Duan H. Gene Therapy for Neurodegenerative Disease: Clinical Potential and Directions. Front Mol Neurosci 2021; 14:618171. [PMID: 34194298 PMCID: PMC8236824 DOI: 10.3389/fnmol.2021.618171] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 05/07/2021] [Indexed: 12/21/2022] Open
Abstract
The pathogenesis of neurodegenerative diseases (NDDs) is complex and diverse. Over the decades, our understanding of NDD has been limited to pathological features. However, recent advances in gene sequencing have facilitated elucidation of NDD at a deeper level. Gene editing techniques have uncovered new genetic links to phenotypes, promoted the development of novel treatment strategies and equipped researchers with further means to construct effective cell and animal models. The current review describes the history of evolution of gene editing tools, with the aim of improving overall understanding of this technology, and focuses on the four most common NDD disorders to demonstrate the potential future applications and research directions of gene editing.
Collapse
Affiliation(s)
- Xiaolin Zhu
- Department of Neurosurgery, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Yu Zhang
- Department of Neurosurgery, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Xin Yang
- Department of Neurosurgery, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Chunyan Hao
- Department of Geriatrics, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Hubin Duan
- Department of Neurosurgery, First Hospital of Shanxi Medical University, Taiyuan, China.,Department of Neurosurgery, Lvliang People's Hospital, Lvliang, China
| |
Collapse
|
40
|
Vinodkumar PK, Ozcinar C, Anbarjafari G. Prediction of sgRNA Off-Target Activity in CRISPR/Cas9 Gene Editing Using Graph Convolution Network. ENTROPY (BASEL, SWITZERLAND) 2021; 23:608. [PMID: 34069050 PMCID: PMC8156774 DOI: 10.3390/e23050608] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 05/03/2021] [Accepted: 05/12/2021] [Indexed: 12/26/2022]
Abstract
CRISPR/Cas9 is a powerful genome-editing technology that has been widely applied in targeted gene repair and gene expression regulation. One of the main challenges for the CRISPR/Cas9 system is the occurrence of unexpected cleavage at some sites (off-targets) and predicting them is necessary due to its relevance in gene editing research. Very few deep learning models have been developed so far to predict the off-target propensity of single guide RNA (sgRNA) at specific DNA fragments by using artificial feature extract operations and machine learning techniques; however, this is a convoluted process that is difficult to understand and implement for researchers. In this research work, we introduce a novel graph-based approach to predict off-target efficacy of sgRNA in the CRISPR/Cas9 system that is easy to understand and replicate for researchers. This is achieved by creating a graph with sequences as nodes and by using a link prediction method to predict the presence of links between sgRNA and off-target inducing target DNA sequences. Features for the sequences are extracted from within the sequences. We used HEK293 and K562 t datasets in our experiments. GCN predicted the off-target gene knockouts (using link prediction) by predicting the links between sgRNA and off-target sequences with an auROC value of 0.987.
Collapse
Affiliation(s)
| | - Cagri Ozcinar
- iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia; (P.K.V.); (C.O.)
| | - Gholamreza Anbarjafari
- iCV Lab, Institute of Technology, University of Tartu, 51009 Tartu, Estonia; (P.K.V.); (C.O.)
- PwC Advisory Finland, 00180 Helsinki, Finland
| |
Collapse
|
41
|
Zhang G, Zeng T, Dai Z, Dai X. Prediction of CRISPR/Cas9 single guide RNA cleavage efficiency and specificity by attention-based convolutional neural networks. Comput Struct Biotechnol J 2021; 19:1445-1457. [PMID: 33841753 PMCID: PMC8010402 DOI: 10.1016/j.csbj.2021.03.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 02/26/2021] [Accepted: 03/01/2021] [Indexed: 12/26/2022] Open
Abstract
CRISPR/Cas9 is a preferred genome editing tool and has been widely adapted to ranges of disciplines, from molecular biology to gene therapy. A key prerequisite for the success of CRISPR/Cas9 is its capacity to distinguish between single guide RNAs (sgRNAs) on target and homologous off-target sites. Thus, optimized design of sgRNAs by maximizing their on-target activity and minimizing their potential off-target mutations are crucial concerns for this system. Several deep learning models have been developed for comprehensive understanding of sgRNA cleavage efficacy and specificity. Although the proposed methods yield the performance results by automatically learning a suitable representation from the input data, there is still room for the improvement of accuracy and interpretability. Here, we propose novel interpretable attention-based convolutional neural networks, namely CRISPR-ONT and CRISPR-OFFT, for the prediction of CRISPR/Cas9 sgRNA on- and off-target activities, respectively. Experimental tests on public datasets demonstrate that our models significantly yield satisfactory results in terms of accuracy and interpretability. Our findings contribute to the understanding of how RNA-guide Cas9 nucleases scan the mammalian genome. Data and source codes are available at https://github.com/Peppags/CRISPRont-CRISPRofft.
Collapse
Affiliation(s)
- Guishan Zhang
- Key Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou 515063, China.,School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
| | - Tian Zeng
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China.,Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| | - Xianhua Dai
- School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China.,Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| |
Collapse
|
42
|
Liu Q, Xie L. TranSynergy: Mechanism-driven interpretable deep neural network for the synergistic prediction and pathway deconvolution of drug combinations. PLoS Comput Biol 2021; 17:e1008653. [PMID: 33577560 PMCID: PMC7906476 DOI: 10.1371/journal.pcbi.1008653] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 02/25/2021] [Accepted: 12/21/2020] [Indexed: 02/08/2023] Open
Abstract
Drug combinations have demonstrated great potential in cancer treatments. They alleviate drug resistance and improve therapeutic efficacy. The fast-growing number of anti-cancer drugs has caused the experimental investigation of all drug combinations to become costly and time-consuming. Computational techniques can improve the efficiency of drug combination screening. Despite recent advances in applying machine learning to synergistic drug combination prediction, several challenges remain. First, the performance of existing methods is suboptimal. There is still much space for improvement. Second, biological knowledge has not been fully incorporated into the model. Finally, many models are lack interpretability, limiting their clinical applications. To address these challenges, we have developed a knowledge-enabled and self-attention transformer boosted deep learning model, TranSynergy, which improves the performance and interpretability of synergistic drug combination prediction. TranSynergy is designed so that the cellular effect of drug actions can be explicitly modeled through cell-line gene dependency, gene-gene interaction, and genome-wide drug-target interaction. A novel Shapley Additive Gene Set Enrichment Analysis (SA-GSEA) method has been developed to deconvolute genes that contribute to the synergistic drug combination and improve model interpretability. Extensive benchmark studies demonstrate that TranSynergy outperforms the state-of-the-art method, suggesting the potential of mechanism-driven machine learning. Novel pathways that are associated with the synergistic combinations are revealed and supported by experimental evidences. They may provide new insights into identifying biomarkers for precision medicine and discovering new anti-cancer therapies. Several new synergistic drug combinations have been predicted with high confidence for ovarian cancer which has few treatment options. The code is available at https://github.com/qiaoliuhub/drug_combination.
Collapse
Affiliation(s)
- Qiao Liu
- Department of Computer Science, Hunter College, The City University of New York, New York, United States of America
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, United States of America
- Ph.D. Program in Computer Science, The City University of New York, New York, United States of America
- Ph.D. Program in Biochemistry and Biology, The City University of New York, New York, United States of America
- Helen and Robert Appel Alzheimer’s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, United States of America
- * E-mail:
| |
Collapse
|
43
|
Störtz F, Minary P. crisprSQL: a novel database platform for CRISPR/Cas off-target cleavage assays. Nucleic Acids Res 2021; 49:D855-D861. [PMID: 33084893 PMCID: PMC7778913 DOI: 10.1093/nar/gkaa885] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/23/2020] [Accepted: 10/17/2020] [Indexed: 12/20/2022] Open
Abstract
With ongoing development of the CRISPR/Cas programmable nuclease system, applications in the area of in vivo therapeutic gene editing are increasingly within reach. However, non-negligible off-target effects remain a major concern for clinical applications. Even though a multitude of off-target cleavage datasets have been published, a comprehensive, transparent overview tool has not yet been established. Here, we present crisprSQL (http://www.crisprsql.com), an interactive and bioinformatically enhanced collection of CRISPR/Cas9 off-target cleavage studies aimed at enriching the fields of cleavage profiling, gene editing safety analysis and transcriptomics. The current version of crisprSQL contains cleavage data from 144 guide RNAs on 25,632 guide-target pairs from human and rodent cell lines, with interaction-specific references to epigenetic markers and gene names. The first curated database of this standard, it promises to enhance safety quantification research, inform experiment design and fuel development of computational off-target prediction algorithms.
Collapse
Affiliation(s)
- Florian Störtz
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| | - Peter Minary
- Department of Computer Science, University of Oxford, Parks Road, Oxford OX1 3QD, UK
| |
Collapse
|
44
|
Antao AM, Karapurkar JK, Lee DR, Kim KS, Ramakrishna S. Disease modeling and stem cell immunoengineering in regenerative medicine using CRISPR/Cas9 systems. Comput Struct Biotechnol J 2020; 18:3649-3665. [PMID: 33304462 PMCID: PMC7710510 DOI: 10.1016/j.csbj.2020.11.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Revised: 11/16/2020] [Accepted: 11/16/2020] [Indexed: 12/14/2022] Open
Abstract
CRISPR/Cas systems are popular genome editing tools that belong to a class of programmable nucleases and have enabled tremendous progress in the field of regenerative medicine. We here outline the structural and molecular frameworks of the well-characterized type II CRISPR system and several computational tools intended to facilitate experimental designs. The use of CRISPR tools to generate disease models has advanced research into the molecular aspects of disease conditions, including unraveling the molecular basis of immune rejection. Advances in regenerative medicine have been hindered by major histocompatibility complex-human leukocyte antigen (HLA) genes, which pose a major barrier to cell- or tissue-based transplantation. Based on progress in CRISPR, including in recent clinical trials, we hypothesize that the generation of universal donor immune-engineered stem cells is now a realistic approach to tackling a multitude of disease conditions.
Collapse
Affiliation(s)
- Ainsley Mike Antao
- Graduate School of Biomedical Science and Engineering, Hanyang University, Seoul, South Korea
| | | | - Dong Ryul Lee
- Department of Biomedical Science, College of Life Science, CHA University, Seoul, South Korea
- CHA Stem Cell Institute, CHA University, Seoul, South Korea
| | - Kye-Seong Kim
- Graduate School of Biomedical Science and Engineering, Hanyang University, Seoul, South Korea
- College of Medicine, Hanyang University, Seoul, South Korea
| | - Suresh Ramakrishna
- Graduate School of Biomedical Science and Engineering, Hanyang University, Seoul, South Korea
- College of Medicine, Hanyang University, Seoul, South Korea
| |
Collapse
|