White CH, Moesker B, Ciuffi A, Beliakova-Bethell N. Systems biology applications to study mechanisms of human immunodeficiency virus latency and reactivation. World J Clin Infect Dis 2016; 6(2): 6-21 [DOI: 10.5495/wjcid.v6.i2.6]
Corresponding Author of This Article
Nadejda Beliakova-Bethell, PhD, Department of Medicine, University of California San Diego, Stein Clinical Research Building, Rm. 303, 9500 Gilman Drive, #0679, La Jolla, CA 92093, United States. firstname.lastname@example.org
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Clin Infect Dis. May 25, 2016; 6(2): 6-21 Published online May 25, 2016. doi: 10.5495/wjcid.v6.i2.6
Systems biology applications to study mechanisms of human immunodeficiency virus latency and reactivation
Cory H White, Bastiaan Moesker, Angela Ciuffi, Nadejda Beliakova-Bethell
Cory H White, Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA 92093, United States
Cory H White, San Diego VA Medical Center and Veterans Medical Research Foundation, San Diego, CA 92161, United States
Bastiaan Moesker, Clinical and Experimental Sciences, Faculty of Medicine, University of Southampton, Southampton, Hants SO16 6YD, United Kingdom
Angela Ciuffi, Institute of Microbiology, University Hospital of Lausanne (CHUV) and University of Lausanne, 1011 Lausanne, Switzerland
Nadejda Beliakova-Bethell, Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
ORCID number: $[AuthorORCIDs]
Author contributions: All authors contributed equally to this paper with conception and design of the study, literature review and interpretation, manuscript preparation and approval of the final version.
Supported by The grant from the National Institutes of Health, Martin Delaney Collaboratory of AIDS Researchers for Eradication (CARE, U19 AI 096113); the Swiss National Science Foundation (grant 31003A_146579); and the University of California, San Diego Fellowships for Graduate Researchers, Frontiers of Innovation Scholars Program.
Conflict-of-interest statement: No potential conflict of interest.
Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Correspondence to: Nadejda Beliakova-Bethell, PhD, Department of Medicine, University of California San Diego, Stein Clinical Research Building, Rm. 303, 9500 Gilman Drive, #0679, La Jolla, CA 92093, United States. email@example.com
Telephone: +1-858-5528585 Fax: +1-858-5527445
Received: September 30, 2015 Peer-review started: October 7, 2015 First decision: November 30, 2015 Revised: January 15, 2016 Accepted: March 7, 2016 Article in press: March 9, 2016 Published online: May 25, 2016
Eradication of human immunodeficiency virus (HIV) in infected individuals is currently not possible because of the presence of the persistent cellular reservoir of latent infection. The identification of HIV latency biomarkers and a better understanding of the molecular mechanisms contributing to regulation of HIV expression might provide essential tools to eliminate these latently infected cells. This review aims at summarizing gene expression profiling and systems biology applications to studies of HIV latency and eradication. Studies comparing gene expression in latently infected and uninfected cells identify candidate latency biomarkers and novel mechanisms of latency control. Studies that profiled gene expression changes induced by existing latency reversing agents (LRAs) highlight uniting themes driving HIV reactivation and novel mechanisms that contribute to regulation of HIV expression by different LRAs. Among the reviewed gene expression studies, the common approaches included identification of differentially expressed genes and gene functional category assessment. Integration of transcriptomic data with other biological data types is presently scarce, and the field would benefit from increased adoption of these methods in future studies. In addition, designing prospective studies that use the same methods of data acquisition and statistical analyses will facilitate a more reliable identification of latency biomarkers using different model systems and the comparison of the effects of different LRAs on host factors with a role in HIV reactivation. The results from such studies would have the potential to significantly impact the process by which candidate drugs are selected and combined for future evaluations and advancement to clinical trials.
Core tip: Gene expression profiling and systems biology methods are reviewed with respect to their possible application in the field of human immunodeficiency virus (HIV) research. Studies profiling gene expression in latently infected and uninfected cells are summarized to illustrate application of these methods to identification of latency biomarkers and the molecular mechanisms contributing to regulation of HIV expression. Studies that measure changes in host and HIV gene expression upon treatment with latency reversing agents (LRAs) highlight uniting themes driving HIV reactivation and identify novel mechanisms of action of LRAs. The field will further benefit from increased adoption of systems biology methods in future studies.
Citation: White CH, Moesker B, Ciuffi A, Beliakova-Bethell N. Systems biology applications to study mechanisms of human immunodeficiency virus latency and reactivation. World J Clin Infect Dis 2016; 6(2): 6-21
In the present era of combination anti-retroviral therapy (cART), the persistence of cellular human immunodeficiency virus (HIV) reservoir is considered to be the major barrier to a cure. This cellular reservoir mainly consists of latently infected resting CD4+ T cells bearing HIV integrated provirus. It is highly stable[2-5] and inducible, necessitating life-long adherence to cART to prevent rebound of viremia. In a search for therapeutic strategies to eradicate this latent reservoir, mechanisms leading to latency have been extensively studied and include transcriptional and post-transcriptional blocks[1,6-14].
The main strategies directed toward a cure are reviewed elsewhere[6,7,9,12,15-17] and include the inactivation of replication-competent virus and the elimination of latently infected cells. An essential milestone to HIV reservoir eradication is the identification of biomarkers of latently infected cells[18,19], so that these cells can be specifically targeted by immunotoxins. Currently, the foremost strategy for elimination of latently infected cells is controlled virus reactivation in the presence of continuing cART (“shock and kill”)[21,22]. For this purpose, small molecule compound latency reversing agents (LRAs) are currently tested. The first LRAs used were histone deacetylase (HDAC) inhibitors (HDACi), which progressed to clinical trials[23-27] and demonstrated the ability to induce expression of HIV RNA. Unfortunately, none of the studies that followed the reservoir size post-treatment reported a significant reduction[23,25,27]. The multiplicity of molecular mechanisms involved in latency control suggests that a combination approach will likely be required to achieve the degree of reactivation necessary for the infected cell to be recognized by the immune system[28-30]. Indeed, some of the tested LRA combinations demonstrated synergy for HIV reactivation[31-35].
Gene expression profiling techniques and systems biology applications may be extremely useful in the identification of biomarkers of latency, further delineating mechanisms of regulation of HIV expression in a search for novel strategies of latency reversal, and for our understanding of the mechanisms of action of existing LRAs. Methods of analysis of gene expression data have been reviewed previously[36-40], including application of bioinformatics methods to HIV integration site analysis and the assessment of transcriptome and proteome changes induced in cells infected with HIV. The present review provides a broader perspective on the use of gene expression profiling and systems biology applications in the field of HIV latency and eradication. Specifically, the objectives of the present review are: (1) to review the existing gene expression profiling and systems biology methods and their potential in the field of HIV research. We focus on the transcriptomic methods, and progress from simple approaches of differential gene expression to more complex types of analyses that integrate transcriptomic data with other biological data types, including proteomic analyses, integration site distribution, epigenetic modifications and transcription factor databases; and (2) to systematically demonstrate how methods of gene expression profiling and systems biology have been applied to answer specific questions in the fields of HIV latency and eradication. In this section we summarize specific findings that were obtained using gene expression profiling and systems biology methods, as described in existing literature.
GENE EXPRESSION PROFILING AND SYSTEMS BIOLOGY APPROACHES APPLIED IN THE FIELD OF HIV LATENCY AND ERADICATION
In this section, we describe the major methods of gene expression analysis and systems biology approaches and outline specific questions that can be addressed in the fields of HIV latency research and eradication using LRAs by each major type of application (Table 1). Where applicable, we highlight advantages and disadvantages of using individual methods over other methods for HIV latency related studies.
Table 1 Methods of gene expression profiling and systems biology and their applications in the field of human immunodeficiency virus latency and eradication.
Applications to discovery of latency biomarkers and mechanisms of regulation of HIV expression
Applications to studying the LRA mechanisms of action and evaluating combination therapies
Differential gene expression
Identification of latency biomarkers
Identification of genes responsive to LRA treatment
GO term/pathway enrichment
(1) Focusing study efforts upon gene groups of interest (e.g., membrane proteins as biomarkers)
(1) Elucidation of mechanisms of action of LRAs
(2) Identification of the mechanisms behind gene expression alterations
(2) Selection of gene targets for combination therapy based on gene function in enriched pathway
(3) Delineating the molecular mechanisms contributing to latency control
Identification of major regulators involved in HIV latency control, which may be only slightly dysregulated but significantly affect downstream molecules and pathways
(1) Elucidation of mechanisms of action of LRAs;
(2) Prioritization of targets for combination therapies based upon type of connectivity (include if it regulates HIV-related processes; exclude if it regulates general intracellular processes)
Consolidating gene expression with other biological data (proteome, integration sites, chromatin features, etc.)
(1) Identification of latency biomarkers with transient RNA, but stable protein expression;
(1) Identification of post-transcriptional mechanisms of action of LRAs;
(2) Identification of mechanisms of latency control by correlating chromatin features to gene expression
(2) Assessment of chromatin features of genes and HIV integration sites responsive to LRA treatment
HIV expression and transcript type
Potential biomarker of latency
Assessment of the effectiveness of LRAs for HIV reactivation
This basic analysis, common in all gene expression studies (Figure 1), aims at identifying genes that are expressed at different levels among the conditions tested. Gene expression can be compared in latently infected and uninfected cells to identify biomarkers of latency, and between cells treated with LRAs and untreated cells to identify genes that are responsive to LRA treatment.
Figure 1 Summary of methods used across gene expression profiling studies in the field of human immunodeficiency virus latency and eradication.
Identification of DEGs and functional analysis of GO terms and pathways enriched for DEGs are the methods that are most commonly used across studies. Network-based analyses are used in a subset of studies; while methods that consolidate host gene expression with other data types (e.g., proteomics or HIV expression data) are scarce. DEGs: Differentially expressed genes; GO: Gene ontology; HIV: Human immunodeficiency virus.
To obtain gene expression data, two primary technologies are available: Microarrays and RNA-Seq. The majority of the published studies in the HIV latency field utilized microarrays, which is a well-developed technology with a fully established data analysis pipeline. However, because microarrays use specific oligonucleotide probes, the detection is limited to only known genes. In addition, most of the microarray platforms are species-specific, which does not allow for simultaneous detection of host and pathogenic RNAs present in a sample. With advances in RNA-Seq technology and per sample cost reduction, gene expression profiling by RNA-Seq is more increasingly used. RNA-Seq allows measuring viral and cellular transcripts concomitantly in the same sample. Other benefits of using RNA-Seq include increased sensi–tivity towards rare transcripts (as may be the case for HIV transcripts in latent state); detection of novel splice variants; and the wide dynamic range (reviewed in). Numerous methods exist to analyze microarray (reviewed in[36,37,44]) and RNA-Seq datasets (reviewed in[38,39]), including methods of data processing, normalization and identification of differentially expressed genes (DEGs).
While methods of identification of DEGs are relatively straightforward, their application to mechanistic studies is limited. First, these methods usually generate far more DEGs that can be meaningfully discussed due to the lack of existing knowledge of their role in regulation of HIV expression. The second major issue in such studies is multiple comparisons. As more genes are included in either microarrays or RNA-Seq studies, the threshold for differential expression becomes much harder to reach due to the increased chance of type 1 error. Finally, a third issue arises with regards to the ranking of importance for genes which are differentially expressed. These can be ranked based upon fold change or a ranking system based upon prior knowledge of the gene. However, a gene product which is an important player of a pathway may not be well characterized, nor be heavily dysregulated, but may still cause large downstream changes.
Functional analyses to identify gene ontology terms and pathways enriched for DEGs
These frequently used methods (Figure 1) are designed to identify groups of genes sharing a common functional category or purpose that is significantly altered by gene dysregulation. Functional gene annotation may be useful for biomarker discovery to identify genes that encode membrane proteins. These proteins represent more feasible targets for antibody-bound immunotoxins as compared to intracellular proteins. Mainly, though, gene ontology (GO) term and pathway enrichment analysis is used to identify the mechanisms behind gene expression alterations in latency and during LRA treatment. Finally, specific pathways may be identified for targeting in combinatorial reactivation strategies, based on enrichment for DEGs.
There are numerous databases of annotated GO terms and pathways, and methods to analyze these functional categories, many of which are publicly available tools (reviewed in). Gene set enrichment analysis (GSEA) approaches are the most commonly used method to identify GO terms and pathways that are enriched for DEGs[45-47]. Among these, ToppGene has several advantages, including a user-friendly interface, allowing multiple input codes for genes, and performing both GO term and pathway enrichment analyses. Many similar functions are available in the DAVID Bioinformatics Resources tool. GoSeq tool was developed specifically for RNA-Seq data and quantifies gene length bias present in the data. In cases when an intervention significantly alters the expression of an extremely large number of genes, as may be the case for some LRAs, GSEA approaches may not work as most categories are enriched. An alternative method, Functional Analysis of Individual Microarray Expression utilizes an exponentially decreasing weighted expression to generate a score for each GO category or pathway in both experimental and control conditions. A t-test, or other statistical test can be then performed to determine if the scores are significantly different. One drawback of this method is the importance placed upon highly expressed genes. However, lowly expressed genes may play other roles through post-translational modifications or hub roles which are not detected by this method or differential expression methods in general. To address these issues, network analysis techniques are extremely useful.
Network-based gene expression analyses
These tools, used in about half of the studies in the field of HIV latency (Figure 1), are designed to identify key functional regulators among DEGs, and to evaluate gene network differences among experimental conditions. In the network-based analyses, the function of a single gene may be elucidated through a “guilt by association” approach. High connectivity between a known and unknown gene may shed light upon their function. Additionally, a group of highly connected genes may indicate that a biologically relevant pathway is at work in the altered state. These pathways or networks of genes can be tested for differential expression without the high type 1 error rate, which is common when testing many thousands of individual genes. Heavily connected genes whose importance may have been missed in a standard differential expression test would show up in a network method as a hub (highly connected) gene. In this way, additional genes with a role in latency control or reactivation may be identified, which would be missed in other types of analysis. Finally, genes may be selected as therapeutic targets based on the network analysis, if they are connected to other factors with roles in HIV latency control. Conversely, if a gene is connected to genes that encode proteins with broad cellular functions, it may be selected against as side effects from a therapeutic intervention would be expected.
One well-developed network analysis tool is Weighted Gene Co-expression Network Analysis (WGCNA). In this method, the connectivity between genes is determined by correlating the expression of these genes across samples, independent of known protein-protein and protein-DNA interactions. First, an adjacency matrix is constructed based on correlations between each gene pair, followed by creating a topological overlay map (TOM) that utilize information not only from the direct interaction between two genes, but also their neighboring nodes. Once this TOM is created, genes may be subdivided into highly connected groups or modules. The eigengene of this module represents the mathematically optimal summary of the expression profiles of all genes within the module as determined by their expression variation across samples. This eigengene may then be correlated to any trait of interest, such as the expression of specific HIV transcripts, or the degree of HIV reactivation upon treatment with LRAs. Genes with unknown function may be explored through both the behavior of the module as a whole and within the module itself (peripheral gene or a primary hub gene). Highly connected genes often represent key players in pathways and shed light upon the mechanistic differences between the two conditions being compared, such as uninfected CD4+ T-cells vs HIV-infected CD4+ T cells. Another network-based method, the “Active modules” algorithm, utilizes a different approach to network analysis by determining which portions of the network contain an unexpectedly high occurrence of genes with significant changes in expression. In contrast with WGCNA, the “active modules” algorithm utilizes protein interaction data from available databases, which allows incorporating information about the host and HIV interactions. Available software packages for network analysis usually use literature curated protein-protein and protein-DNA interactions databases, but do not take into account enrichment of specific clusters for DEGs (e.g., Metacore, Ingenuity, iRefWeb). A major advantage of utilizing known interactions is independence from differential expression (i.e., all known protein-protein and protein-DNA interactions will be displayed for each DEG). A drawback of literature-based networks is the dependency on the accuracy of annotated sources and the robustness of the algorithms for network generation.
Integrating gene expression with other types of biological data
Methods of transcriptomics are well-developed and capture the majority of annotated genes. However, previous studies have shown that the transcriptome only partially correlates with the proteome[52-54]; therefore, assessment of gene expression at the functional (protein) level may be necessary to validate the role of specific genes in HIV latency control and reactivation. In addition, proteomics methods identify the effects that are not reflected or captured at the RNA level; for example, due to an increase of translation from existing messenger RNA, or because of the transient RNA expression. Thus, proteome profiling may be used to identify latency biomarkers that are stably expressed at the protein level. In addition, profiling of post-transcriptional effects of LRAs is beneficial to capture those effects that would be missed if only the transcriptome profiling were performed. Analysis of the proteome may thus shed light on the mechanisms by which LRAs regulate gene expression, including, possibly, transcriptional activation of HIV.
Other biological data types may be integrated with gene expression profiling data to further understand the mechanisms of HIV latency and reactivation. The activity of the HIV promoter may depend on the characteristics of the site of proviral integration. Chromatin features surrounding an integration site may contribute to the levels of HIV transcription, including histone acetylation and methylation, and DNA methylation. For example, latent inducible proviruses have a tendency to be integrated into highly expressed genes, gene deserts, or alphoid repeats. The transcription level of nearby genes as well as viral genome orientation may influence transcription of viral genes by RNA interference mechanisms[59-61]. However, to date, no clear feature of integration sites could be identified when comparing 5 different models of HIV latency. Integration of HIV into specific genes, such as genes associated with cell cycle, may provide advantage to the maintenance of the latent reservoir through clonal expansion.
Depending on the type of data, different modeling methods may be used. The study described below was done with cancer cell lines; however, their method of integrating datasets would be applicable for many types of HIV latency related data. The aims of the study were to determine how DNA methylation in different genomic regions contribute to gene expression in cancer cell lines, and whether methylation of transcription factor binding sites impact transcription factor recruitment and therefore gene expression. Gene expression was measured by Affymetrix microarrays, and DNA methylation by methyl-CpG binding domain-based capture (MbDCap)-Seq. Pearson correlation analysis and decision tree learning were used to determine the effect of methylation in various genomic regions (promoters, first and second exons, and first introns) on the breast cancer subtype differential gene expression. To determine the role of methylation in transcription factor binding, cell line-specific consensus sequences were generated by assembling reads that mapped to the significantly hypermethylated regions and then matching these sequences to candidate transcription factors using the TRANSFAC package. Similar approaches can be used to determine the role of chromatin features such as DNA methylation, as well as histone acetylation and methylation, in regulation of the expression levels of genes that control HIV latency, in the latent state and during reactivation using LRAs.
Evaluating the levels of HIV RNA using RNA-Seq datasets
HIV full length unspliced (US) genomic RNA can be spliced into different mRNA species, 47 identified in an early study, and 78 more recently. The major classes of transcripts constitute multiply spliced (MS) transcripts that encode regulatory and accessory proteins Tat, Rev, and Nef; and singly spliced (SS) transcripts that encode one-exon Tat, Vpr, Vif, Vpu, and Env. The US transcripts encode Gag and Gag-Pol polyproteins. In cell line models of latency (ACH-2 and U1), MS and SS transcripts were detected at early stages of replication cycle, when little or no genomic (US) RNA was produced. Both MS and US transcripts were detected at low levels in resting CD4+ T cells from the HIV-infected individuals, while the majority of detected transcripts represented abortive HIV transcripts lacking polyA tail. As was suggested previously, HIV RNA itself may represent a biomarker of latency. While multiple assays have been developed to detect HIV RNA using PCR-based methods[72,73], they require design of specific primers to detect various forms of HIV RNA, and may be plagued by inability to detect HIV RNA in a subset of patients due to virus mutations. RNA-Seq technology allows for concomitant detection and quantification of various HIV RNA species from the same samples as host transcripts, regardless of the viral sequence. Total HIV transcripts, including the abortive transcripts, can be measured by RNA-Seq using total RNA (ribo-depleted) libraries that capture non-polyadenylated RNAs.
RNA-Seq can also be used to evaluate induction of HIV expression using LRAs. In this case, libraries enriched for polyA (polyadenylated) RNAs would be a more appropriate choice, since induction of abortive transcripts or read-through transcripts from the neighboring genes is not relevant to the success of the “shock and kill” strategy, as no viral proteins will be produced. Specifically, induction of polyA US transcripts would need to be monitored, as it is indicative of productive infection (that will result in production of virions). Unfortunately, none of the existing RNA-Seq data analysis packages have reliable tools for precise splice variant measurement from standard RNA-Seq datasets (50-100 base pair reads), in particular, complex overlapping sequences as in the case of HIV. Precise measurement of splice variants require longer read capacity (10 kb); otherwise, expression of the major splice variants, MS and SS, and the US genomic RNA can be only estimated. Mohammadi et al developed a method that allows the approximation of the proportions of different HIV transcripts in the RNA-Seq data. The method is based on determining the number of reads that pass through the splice junctions D1 [directly after the long terminal repeat (LTR) region] and D4 (splice junction between Tat-Rev and Vpu) that define MS, SS, and US transcripts. If a read passes through the junction D1, then it belongs to the US transcript. Reads which align to the left of the D1 junction but are broken at D1 and align to another segment of the HIV genome correspond to reads from either SS or MS transcripts (SS + MS). Reads overlapping the D4 junction correspond to reads from either US transcripts or SS transcripts (US + SS). Finally, reads which are broken at the D4 junction correspond to reads from MS transcripts. The SS read percentage is then estimated by subtracting the US and MS percentages from 100.
USING TRANSCRIPTOME PROFILING TO IDENTIFY BIOMARKERS OF HIV LATENCY
A recent study provided a proof of principle that immunotoxins can be used to target cells expressing a specific surface molecule; however, the choice of CCR5 co-receptor resulted in killing of both HIV-infected and uninfected CCR5-expressing cells. This choice of target would not be optimal for therapeutic applications, since CD4+ T cells are usually already compromised in HIV-infected individuals. Therefore, identification of a unique biomarker signature of latently infected cells is warranted to target these cells for eradication with high specificity. These biomarkers may have additional applications; for example, reliable quantification of latently infected cells in vivo to follow the size of the latent reservoir in patients post-treatment, and enrichment for latently infected cells for further studies.
The proof of principle that latently infected cells may have a distinct gene expression signature was provided in an early study comparing gene expression in resting CD4+ T cells from aviremic HIV-infected individuals and HIV seronegative donors as controls using microarrays. Whilst less than 0.1% of cells from aviremic patients were latently HIV-infected (as determined by presence of HIV-1 proviral DNA), 165 genes showed differential expression between CD4+ T cells from aviremic patients as compared to HIV-seronegative donors. The limitations of this study were the low prevalence of latently infected cells and the confounding effect of antiretroviral therapy on gene expression. Later studies aimed at characterizing the gene expression profile of latently HIV-infected cells using chronically HIV-infected cell lines or in vitro infected primary resting CD4+ T cells and reporter viruses, allowing for strategies to enrich or select for latently HIV-infected cells.
Table 2 summarizes the four studies comparing gene expression in latently infected cells vs their uninfected counterparts. To estimate the proportions of latently infected cells present in each model, provirus expression is reactivated following establishment of latency, using strong agents that induce T cell activation, such as phorbol myristate acetate, anti-CD3/anti-CD28 + IL-2, or phytohemagglutinin and feeder peripheral blood mononuclear cells. The percentage of uninfected cells may be estimated by subtracting the percentage of latently infected cells from the total (100%), assuming that all latent proviruses were induced. The percentage of cells expressing HIV Gag protein (p24+) or GFP reporter is also measured before the stimulation, to determine whether there is background expression of HIV in each latency model. These p24+ or GFP+ cells may represent productively infected cells present due the leakiness of a model, or be reflective of the viral entry in the absence of de novo viral production. Of note, Krishnan and Zeichner provided these estimates only for one of the cell lines studied, ACH-2. The proportions of each cell type need to be taken into account when evaluating the results from differential expression analysis.
Table 2 Features of gene expression studies comparing latently infected vs uninfected cells.
Primary CD4+ T cells co-cultured with feeder H80 human brain tumor cell line
Primary resting CD4+ T cells co-cultured with dendritic cells
CXCR4 tropic HIV-1 LAV strain
CXCR4 tropic GFP reporter virus (GFP inserted in place of Nef)
CXCR4 tropic GFP reporter virus with mutations in Gag, Vif, Vpr, Vpu, Env and Nef
CCR5 tropic GFP reporter virus (GFP inserted into the Nef open reading frame)
Proportion of uninfected cells
Proportion of GFP+ or p24+ cells
0% (removed by sorting)
Proportion of latently infected cells
Time of culture
N/A (chronically infected)
Gene expression profiling platform
Microarrays (Hs. UniGem2)
Microarrays (Agilent-012391 Whole Human Genome Oligo Microarray G4112A)
RNA-Seq (polyA RNA library; Illumina HiSeq2000)
Microarrays (Illumina Human-Ref8)
Method to identify DEGs
Parametric one-sample random variance t-test (BRB-Array Tools, P < 0.001)
Linear modeling and using an empirical Bayes method with FDR correction (limma)
Generalized linear modeling (DESeq, FDR < 0.05)
Linear modeling and using an empirical Bayes method (limma, FDR < 0.05)
Databases used for functional analyses
Reactome pathways Ver.40;
Total number of DEGs
CXCR4: Chemokine (C-X-C motif) receptor 4; LAV: Lymphadenopathy-associated virus; CCR5: Chemokine (C-C motif) receptor 5 (gene/pseudogene); GFP: Green fluorescent protein; polyA: Polyadenylated; DEGs: Differentially expressed genes; BRB: Biometric Research Branch; FDR: False discovery rate; NIH: National Institutes of Health; mAdb: Mad Bee; GO: Gene ontology; MsigDb: Molecular Signature database; KEGG: Kyoto Encyclopedia of Genes and Genomes; IPA: Ingenuity Pathway Analysis; N/A: Not applicable.
Table 2 presents additional characteristics that differed among the studies, including cells that were used (proliferating cell lines, resting CD4+ T cells or total CD4+ T cells), the duration of time in culture and viruses used to infect the cells. Finally, gene expression profiling platforms and statistical approaches to analyze the data were also different.
In order to assess whether biomarkers of latency can be reliably identified using gene expression profiling, we compared the DEG lists, where available (all studies except for Evans et al). Krishnan and Zeichner reported 32 genes that were consistently changed in latency in all three cell lines that were tested, and this list of DEGs was used. The number of DEGs from each study that participated in this analysis is indicated in Table 2 (bottom row). If consistent changes across model systems could be detected, these genes would represent strong latency biomarker candidates.
Figure 2 depicts the result of comparison of DEGs between latently infected and uninfected cells available from three published studies[18,19,42]. A total of 1094 DEGs were identified. Only one gene, LYN proto-oncogene, Src family tyrosine kinase (LYN), was dysregulated in latency in all three models. Not surprisingly, there were fewer similarities between the cell lines and each of the primary cell models. In addition to LYN, only four genes were in common between Krishnan and Zeichner and Iglesias-Ussel et al studies. More similarities were found when comparing the two studies that performed gene expression profiling using primary CD4+ T cells (Iglesias-Ussel et al and Mohammadi et al): 34 genes were found in common, with the majority (29 of 34) consistently up- or down- regulated in latency in both models. The remaining genes were unique for any given study (27 of 32, or 84% for Krishnan and Zeichner, 836 of 875, or 96% for Iglesias-Ussel et al, and 192 of 227, or 85% for Mohammadi et al).
Figure 2 Venn diagram depicting differentially expressed genes across three latency models.
The overlapping genes were identified using the online tool Venny (http://bioinfogp.cnb.csic.es/tools/venny/index.html). Shown are the total number of differentially expressed genes and percent of total identified across all models[18,19,42]. For each overlap, gene symbols are listed. For the overlap between Iglesias-Ussel et al and Mohammadi et al studies, the four genes with the highest average absolute fold change are listed.
This comparison indicated that despite the small proportion of overlapping genes between models, genes whose products may be able to differentiate between latently infected and uninfected cells can be identified using gene expression profiling, especially when comparing models established in primary cells. However, these studies have several limitations that presently preclude from achieving a consensus on what genes may represent suitable biomarkers of latency. These limitations and potential solutions that may advance this field are summarized in Table 3.
Table 3 Limitations of the present studies that identify differentially expressed genes between latently infected and uninfected cells and possible solutions that may enable identification of solid candidate biomarkers of latency.
Small percentage of latently infected cells
Isolate latently infected cells using reporter system OR perform gene expression profiling on a single-cell level
Effect from the exposure to the virus without infection
Use aldrithiol-2 inactivated virus instead of mock-infection to compare to latently infected cell model
Identified differentially expressed genes are ubiquitously expressed on all CD4+ T cells
Identify a panel of biomarkers that best differentiates between latently infected and uninfected cells
Different models represent different aspects of latency establishment
Include additional models into analysis; use same statistical approaches to ensure differences in biomarkers are biological, not technical differences
Gene expression profiling can only identify candidate biomarkers
Perform experimental validation that latently infected cells can be detected using these biomarkers
TRANSCRIPTOME PROFILING AND SYSTEMS BIOLOGY APPROACHES TO IDENTIFY MOLECULAR MECHANISMS OF REGULATION OF HIV EXPRESSION
Understanding the mechanisms of establishment and maintenance of HIV latency has greatly contributed to the development of strategies for eradication. It has become apparent that multiple cellular processes and pathways contribute to the control of HIV latency at both the transcriptional and post-transcriptional levels, suggesting that combination strategies will likely be needed to achieve eradication of the latent reservoir. Block of viral transcription from the LTR is the most studied mechanism, which occurs through several proposed routes: Inhibition of transcription though histone and DNA modifications[77-79]; absence of necessary transcriptional activators and presence of transcriptional repressors in resting CD4+ T cells[80,81]; integration into inactive transcription sites; or premature termination of viral transcripts in the absence of Tat and Tat-associated host factors. Another mechanism suggests that latency may be maintained due to post-transcriptional blocks. HIV could be transcribed, but could fail to export MS HIV transcripts, contributing to non-productive infection in resting CD4+ T cells. Finally, discoveries in the field of inhibitory micro RNAs (miRNAs) suggest a possibility of transcriptional inhibition of HIV by miRNAs encoded in HIV genome and translational inhibition by host miRNAs.
Gene expression profiling data can be used to identify gene categories that describe cellular processes and pathways, as well as key regulatory factors with a role in HIV latency control, thus contributing to our understanding of the mechanisms that regulate HIV expression. The same studies described in Table 2 performed functional category analysis by identifying pathways and GO terms enriched for DEGs. Though these four studies utilized different cell types and viruses (Table 2), some uniting themes were observed in the mechanisms contributing to HIV latency control. We utilized the lists of GO terms and pathways that were reported in each of the four studies, to compare the gene categories dysregulated in different latency models. The reported terms were assigned to two major categories: Transcriptional regulation, including signaling pathways that regulate activity and localization of transcription factors, and functional categories related to RNA synthesis; and post-transcriptional regulation, both at the RNA and protein levels (Figure 3); terms that could not be assigned to these categories are not shown. Not surprisingly, the specific GO terms and pathways in each category were different between the studies, which was at least in part attributable to the usage of different annotated databases to obtain these terms (Table 2). However, terms associated with both transcriptional and post-transcriptional control of HIV latency were reported in more than one study. These GO terms and pathways comprise both well-established (e.g., NFκB signaling and transcriptional regulation[86,87]) and novel mechanisms of regulation of HIV expression (e.g., proteasome).
Figure 3 Transcriptional and post-transcriptional mechanisms of regulation of human immunodeficiency virus expression.
Pathway and GO term categories related to transcriptional and post-transcriptional regulation of HIV expression, identified in gene expression studies that compared latently infected and uninfected cells, are shown. Dark blue, Iglesias-Ussel et al; Red, Mohammadi et al; Brown, Evans et al; Yellow, Krishnan and Zeichner. GO: Gene ontology; HIV: Human immunodeficiency virus; mTOR: Mammalian target of rapamycin.
Network-based approaches can also be utilized to identify genes that may have a role in regulation of HIV expression, despite not being detected as differentially expressed in latency. For example, tubulin alpha 3 (TUBA3) was a well-connected gene in a network constructed by Bandyopadhyay et al who utilized the Krishnan and Zeichner dataset. TUBA3 was connected to both Tat and Rev in the network, suggesting a possible yet unknown post-transcriptional role for this gene in regulation of HIV expression, one which would not have been detected in non-network-based approaches.
Taken together, functional studies using systems biology approaches to analyze host gene expression in the in vitro models of HIV latency suggest that maintenance of HIV quiescence in T cells involves basic cellular mechanisms beyond those traditionally implicated in transcriptional repression of the HIV-1 provirus.
TRANSCRIPTOME PROFILING AND SYSTEMS BIOLOGY APPROACHES TO IDENTIFY MOLECULAR MECHANISMS OF HIV REACTIVATION USING LRAS
HDACis have been the most studied LRAs, with a number of these compounds progressing to clinical trials[23-27]. The primary mechanism of action proposed for HIV reactivation using HDACis was histone acetylation and chromatin decondensation, which provide a transcriptionally favorable environment. However, the results from gene expression profiling studies following the discovery of anti-cancer properties of HDACis (reviewed in) strongly suggest the existence of secondary mechanisms of action of HDACis beyond chromatin remodeling. In particular, despite chromatin decondensation, as many genes were downregulated by HDACis as were upregulated. Over the years, studies using HDACis demonstrated that transformed cells responded to treatment differently as compared to primary cells[90-93]. Therefore, gene expression profiling of HDACis using primary CD4+ T cells is more relevant for delineating the mechanisms driving HIV reactivation. Most of the gene expression studies using HDACis in primary cells up-to-date have utilized the HDACi vorinostat/suberoylanilide hydroxamic acid (SAHA), which was the first of the FDA-approved HDACis for treatment of cutaneous T cell lymphoma. These studies are summarized in Table 4. In addition to SAHA, the effects on gene expression were profiled for another HDACi, valproic acid (VPA) in primary CD4+ T cells from HIV-infected individuals. Treatment with either SAHA or VPA resulted in downregulation of V-Myc avian myelocytomatosis viral oncogene homolog (MYC)[95,96]. Among other LRA classes, the effects of alcohol dehydrogenase inhibitor Disulfiram and protein kinase C (PKC) agonist Prostratin on host gene expression were assessed using primary CD4+ T cells[42,97], while the effects of a bromodomain inhibitor, JQ1, on gene expression were assessed in a cell line model of HIV latency (J-Lat 10.6 T cell line) (see Table 5 for the summary of the studies).
Table 4 Features of gene expression studies comparing suberoylanilide hydroxamic acid -treated and untreated primary cells.
For all classes of compounds tested, Disulfiram appeared to induce minimal changes to host gene expression, while SAHA and Prostratin modulated thousands of genes[42,96,97,99,100]. Gene expression studies were able to identify novel mechanisms contributing to HIV reactivation out of latency by LRAs, besides their primary mechanisms of action. For example, in addition to chromatin decondensation, SAHA upregulated specific HIV transcriptional activators [e.g., immunity-related GTPase family, M (IRGM), heat shock protein 70 (HSP70, gene symbol HSPA2)[102,103] and lysine (K)-specific demethylase (KDM1A)], and downregulated repressors [amino-terminal enhancer of split and AT rich interactive domain 1B, SWI1-like (ARID1B, or BAF250)][25,99,100] (Figure 4A). Sung and Rice found that Prostratin upregulated HIV activator, tumor necrosis factor (ligand) superfamily, member 4 (TNFSF4), and downregulated defensin alpha 1, which interferes with PKC signaling. Among genes with a role in regulation of HIV expression that were modulated by JQ1, Banerjee et al noted upregulation of activators REST coreceptor 1 (RCOR1) and the class III deacetylase sirtuin 1 (SIRT1), and downregulation of repressor methyltransferases, protein arginine methyltransferase 6 (PRMT6) and SET domain, bifurcated 1 (SETDB1)[110,111].
Figure 4 Main findings from gene expression studies using Latency reversing agents.
A: Novel mechanisms of HIV reactivation besides primary mechanisms of action of LRAs. These include upregulation (red arrow) of HIV activators (red oval) and downregulation (blue arrow) of repressors (blue oval). Examples for LRAs from 3 functional classes (HDACi, SAHA; PKC agonist, Prostratin; and bromodomain inhibitor, JQ1) are listed; B: Effects of LRAs on host genes that are inhibitory for HIV reactivation. These include upregulation (red arrow) of HIV repressors (blue oval) and downregulation (blue arrow) of activators (red oval). Examples for LRAs from 2 functional classes (HDACi, SAHA; and PKC agonist, Prostratin) are shown; C: LRAs of different classes act on components of p-TEFb complex via different mechanisms, contributing to HIV reactivation. SAHA induced dissociation of p-TEFb from the inactive 7SK RNA complex and facilitated its recruitment to the HIV LTR. Prostrain and JQ1 upregulated components of p-TEFb complex at the protein and RNA level, respectively (red arrows indicate upregulation). LRA: Latency reversing agent; HDACi: Histone deacetylase inhibitor; PKC: Protein kinase C; SAHA: Suberoylanilide hydroxamic acid; IGRM: Immunity-related GTPase family, M; HSPA2: Heat shock 70 kDA protein 2; KDM1A: Lysine (K)-specific demethylase; TNFSF4: Tumor necrosis factor (ligand) superfamily, member 4; RCOR1: REST coreceptor 1; SIRT1: Sirtuin 1; AES: Amino-terminal enhancer of split; ARID1B: AT rich interactive domain 1B, SWI1-like; DEFA1: Defensin alpha 1; PRMT6: Protein arginine methyltransferase 6; SETDB1: SET domain, bifurcated 1; ETS1: V-Ets avian erythroblastosis virus E26 oncogene homolog 1; LEF1: Lymphoid enhancer-binding factor 1; HMGA1: High mobility group AT-hook 1; HIVEP3: HIV type I enhancer binding protein 3; EZH2: Enhancer of zeste 2 polycomb repressive complex 2 subunit; YY1: YY1 transcription factor; BRD2: Bromodomain protein containing 2; S100A8: S100 Calcium Binding Protein A8; S100A9: S100 Calcium Binding Protein A9; S100A12: S100 Calcium Binding Protein A12; CDK9: Cyclin-dependent kinase 9; P-TEFb: Positive transcription elongation factor; CycT1: Cyclin T1; Hexim-1: Hexamethylene Bis-Acetamide Inducible 1; LTR: Long terminal repeat; Tat: Transactivator of transcription.
In addition to the effects of LRAs on gene expression that may promote HIV reactivation, possible inhibitory effects were also observed in gene expression studies that used SAHA and Prostratin-treated primary cells (Figure 4B). Genes encoding factors that activate HIV transcription, V-Ets avian erythroblastosis virus E26 oncogene homolog 1(ETS1), CCAAT/enhancer binding protein, Beta (CEBPB), and lymphoid enhancer-binding factor 1 (LEF1)[112-114], were downregulated by SAHA in primary CD4+ T cells. Enhancer of zeste 2 polycomb repressive complex 2 subunit (EZH2), a methyltransferase implicated in HIV LTR silencing, was upregulated. Genes encoding HIV transcriptional repressors YY1 and bromodomain protein containing 2 (BRD2) were upregulated by SAHA in blood cells from HIV-infected individuals on cART. Downregulation of ETS1 and LEF1 and upregulation of BRD2 were confirmed at the protein level in primary CD4+ T cells. In addition, a network-based approach integrating transcriptomics and proteomics datasets highlighted upregulation of high mobility group AT-hook 1, which represses HIV transcription by competing with Tat for TAR binding and by recruiting inactive positive transcription elongation factor (p-TEFb) to the HIV LTR. Possible inhibitory effects of Prostratin with respect to HIV reactivation identified by Sung and Rice were upregulation of a repressor, HIV type I enhancer binding protein 3, and downregulation of the three genes encoding S100 calcium-binding proteins (S100A8, S100A9, and S100A12), shown to enhance HIV-1 transcription in a NFκB-dependent manner.
Finally, gene expression profiling studies using LRAs of different functional classes highlighted uniting themes driving HIV reactivation, such as importance of the components of p-TEFb complex (Figure 4C). Cyclin T1 (CycT1) was upregulated at the RNA level by JQ1; both CycT1 and cyclin-dependent kinase 9 were upregulated at the protein level by Prostratin, while SAHA induced dissociation of p-TEFb from the inactive 7SK RNA complex and facilitated its recruitment to the HIV LTR. Though through different mechanisms, p-TEFb function appears to be enhanced via action of several classes of LRAs.
CONCLUSION AND PERSPECTIVES
This review discusses how methods of gene expression profiling and systems biology can be applied to address specific questions in the field of HIV latency and eradication. It presents a systematic analysis of the application of these methods to discover biomarkers of latency, identify molecular mechanisms of latency control and reactivation using LRAs. Identification of DEGs and functional category assessment are the most common methods currently used in the field (Figure 1). Network-based approaches are utilized in a subset of more recent studies. Advances in RNA-Seq technologies allow for integration of HIV expression analysis with the changes in expression of host genes in a single experiment. Integration of transcriptomic data with other biological data types in the field of HIV latency is presently scarce; and the field would benefit from increased adoption of these methods in future studies.
Gene expression analysis of latently infected and uninfected cells has been used to identify candidate biomarkers of latency and to delineate the molecular mechanisms that contribute to regulation of HIV expression. Studies comparing gene expression in HIV latency models to uninfected cells have several limitations that presently preclude from achieving a consensus on what genes may represent suitable biomarkers (Table 3). Improved bioinformatics approaches (e.g., using the same methods of data acquisition and statistical analyses across models) and experimental validation of candidate biomarkers would be extremely useful in future studies to more reliably identify biomarkers of latency. Studies profiling gene expression changes induced by LRAs identified novel mechanisms of action of the LRAs and their inhibitory effects with respect to HIV reactivation out of latency, as well as highlighted uniting themes driving HIV reactivation. Using similar statistical approaches in prospective studies using LRAs would facilitate prediction of whether the inhibitory effects of different LRAs on HIV reactivation could be cancelled out in a combination strategy. The results from such studies would have the potential to significantly impact the process by which candidate drugs are selected and combined for future evaluations and advancement to clinical trials.
This material is based upon work supported in part by the Department of Veterans Affairs (VA), Veterans Health Administration, Office of Research and Development. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of VA or the United States government. We thank Dr. Andrew Rice for sharing the list of DEGs that were identified in the study of the effects of Prostratin in primary CD4+ T cells.
Purcell DF, Martin MA. Alternative splicing of human immunodeficiency virus type 1 mRNA modulates viral protein expression, replication, and infectivity.J Virol. 1993;67:6365-6378.
[PubMed] [DOI][Cited in This Article: ]
Van Lint C, Emiliani S, Ott M, Verdin E. Transcriptional activation and chromatin remodeling of the HIV-1 promoter in response to histone acetylation.EMBO J. 1996;15:1112-1120.
[PubMed] [DOI][Cited in This Article: ]
Herrmann CH, Rice AP. Lentivirus Tat proteins specifically associate with a cellular protein kinase, TAK, that hyperphosphorylates the carboxyl-terminal domain of the large subunit of RNA polymerase II: candidate for a Tat cofactor.J Virol. 1995;69:1612-1620.
[PubMed] [DOI][Cited in This Article: ]
Margolis DM, Somasundaran M, Green MR. Human transcription factor YY1 represses human immunodeficiency virus type 1 transcription and virion production.J Virol. 1994;68:905-910.
[PubMed] [DOI][Cited in This Article: ]
Rossio JL, Esser MT, Suryanarayana K, Schneider DK, Bess JW, Vasquez GM, Wiltrout TA, Chertova E, Grimes MK, Sattentau Q. Inactivation of human immunodeficiency virus type 1 infectivity with preservation of conformational and functional integrity of virion surface proteins.J Virol. 1998;72:7992-8001.
[PubMed] [DOI][Cited in This Article: ]