Echeverría N, Moratorio G, Cristina J, Moreno P. Hepatitis C virus genetic variability and evolution. World J Hepatol 2015; 7(6): 831-845 [PMID: 25937861 DOI: 10.4254/wjh.v7.i6.831]
Corresponding Author of This Article
Pilar Moreno, PhD, Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la República, Mataojo 2055, 11400 Montevideo, Uruguay. firstname.lastname@example.org
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Hepatitis C virus genetic variability and evolution
Natalia Echeverría, Gonzalo Moratorio, Juan Cristina, Pilar Moreno
Natalia Echeverría, Gonzalo Moratorio, Juan Cristina, Pilar Moreno, Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la República, 11400 Montevideo, Uruguay
Gonzalo Moratorio, Viral Populations and Pathogenesis Laboratory, Institut Pasteur, 75724 Paris CEDEX 15, France
Pilar Moreno, Recombinant Proteins Unit, Institut Pasteur de Montevideo, 11400 Montevideo, Uruguay
ORCID number: $[AuthorORCIDs]
Author contributions: Echeverría N contributed to bibliographical revision, figures and table design and article drafting; Moratorio G made contributions to bibliographical revision, table design, article drafting and revision; Cristina J contributed to article drafting and critically revised the manuscript for intellectual content; Moreno P made substantial contributions to conception and design, article drafting and revision of intellectual content; all authors contributed to final approval of the version to be published.
Supported by Agencia Nacional de Investigación e Innovación (ANII) through project FMV_2_2011_1_6971 and PEDECIBA, Uruguay.
Conflict-of-interest: The authors do not have any conflict of interest.
Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Correspondence to: Pilar Moreno, PhD, Laboratorio de Virología Molecular, Centro de Investigaciones Nucleares, Facultad de Ciencias, Universidad de la República, Mataojo 2055, 11400 Montevideo, Uruguay. email@example.com
Telephone: +598-2-5250800 Fax: +598-2-5250895
Received: August 13, 2014 Peer-review started: August 14, 2014 First decision: September 16, 2014 Revised: January 29, 2015 Accepted: February 9, 2015 Article in press: February 11, 2015 Published online: April 28, 2015
Hepatitis C virus (HCV) has infected over 170 million people worldwide and creates a huge disease burden due to chronic, progressive liver disease. HCV is a single-stranded, positive sense, RNA virus, member of the Flaviviridae family. The high error rate of RNA-dependent RNA polymerase and the pressure exerted by the host immune system, has driven the evolution of HCV into 7 different genotypes and more than 67 subtypes. HCV evolves by means of different mechanisms of genetic variation. On the one hand, its high mutation rates generate the production of a large number of different but closely related viral variants during infection, usually referred to as a quasispecies. The great quasispecies variability of HCV has also therapeutic implications since the continuous generation and selection of resistant or fitter variants within the quasispecies spectrum might allow viruses to escape control by antiviral drugs. On the other hand HCV exploits recombination to ensure its survival. This enormous viral diversity together with some host factors has made it difficult to control viral dispersal. Current treatment options involve pegylated interferon-α and ribavirin as dual therapy or in combination with a direct-acting antiviral drug, depending on the country. Despite all the efforts put into antiviral therapy studies, eradication of the virus or the development of a preventive vaccine has been unsuccessful so far. This review focuses on current available data reported to date on the genetic mechanisms driving the molecular evolution of HCV populations and its relation with the antiviral therapies designed to control HCV infection.
Core tip: Hepatitis C virus (HCV) is the major causative agent of parenterally-acquired hepatitis. To date there is no preventive vaccine, and though antiviral therapy has been improved in the past few years, not all patients eradicate the virus as a result of it. The main reason lies in the intrinsic genetic variability that characterises RNA viruses, such as HCV, whose RNA polymerase lacks proof-reading activity, leading to a high mutation rate and the generation of a wide range of genome variants better known as a quasispecies. Therefore this review summarises current data on HCV quasispecies dynamics, antiviral therapy and recombination events.
Citation: Echeverría N, Moratorio G, Cristina J, Moreno P. Hepatitis C virus genetic variability and evolution. World J Hepatol 2015; 7(6): 831-845
Hepatitis C virus (HCV) has infected over 170 million people worldwide and therefore creates a huge disease burden due to chronic, progressive liver disease. Infections with HCV have become a major cause of liver cancer and one of the most common indications for liver transplantation[2-4]. The fact that chronic infection with HCV can lead to cirrhosis and hepatocellular carcinoma creates the need to develop drugs that effectively eradicate the infection and a prophylactic vaccine that prevents its dissemination. Unfortunately, to date there is no effective vaccine available. Currently, the standard of care (SOC) therapy involves pegylated interferon α (INF-α-peg) and ribavirin (RBV). In addition, the new SOC (NSOC) therapy of protease inhibitors boceprevir or telaprevir in combination with INF-α-peg and RBV have been approved for the eradication of HCV genotype 1 in the United States, Europe and Japan[8-11]. Unfortunately, interferon is not widely available globally, not always well tolerated and some genotypes of HCV respond better than others causing that not all patients achieve a sustained virological response (SVR). Other adverse events such as rash have also been associated with the NSOC.
The main route of transmission is direct or indirect exposure to contaminated blood, either through blood transfusions or blood products, through the consumption of intravenous drugs, use of surgical material poorly sterilized, organ transplants, accidents in health centres, vertical transmission from mother to child, etc..
HCV is a member of the family Flaviviridae, although it differs from other members of this family in many details of its genome organization. HCV is a single-stranded, positive sense, RNA virus with a genome of approximately 9600 nucleotides in length. Most of the genome carries a single open reading frame that encodes three structural (core, E1, E2) and seven non-structural (p7, NS2, NS3, NS4A, NS4B, NS5A, NS5B) proteins (Figure 1)[14,15]. In addition, alternative translation products (F protein) have been detected from a reading frame overlapping the core gene (core + 1/ARFP)[16,17]. Possible roles in regulation of gene expression, cell signalling and apoptosis have been suggested[18-20]. Short untranslated regions at each end of the genome (5’-NCR and 3’-NCR) are required for its translation and replication[21,22]. The mechanism of translation initiation is dependent on an internal ribosomal entry site in the 5’-NCR, which interacts directly with the 40S ribosomal subunit.
Figure 1 Organisation of hepatitis C virus genome and hepatitis C virus polyprotein processing.
Schematic representation of the 9.6 kb positive-stranded RNA genome. Simplified RNA secondary structures in the 5’ and 3’ non-coding regions (NCRs) are shown. Internal ribosome entry site (IRES)-mediated translation produces a polyprotein precursor that is processed into the mature structural and non-structural proteins. Nucleotide positions are shown by numbers on the upper part of the scheme. Amino acid positions are shown by numbers in the lower part of the scheme. The coding region is depicted by rectangles showing the corresponding encoded proteins. Solid arrowheads denote cleavages by the endoplasmic reticulum signal peptidase. The open arrowhead indicates further C-terminal processing of the core protein by signal peptide peptidase. Red stars indicate cleavages by the hepatitis C virus NS2 and NS3-4A proteases.
The high error rate of RNA-dependent RNA polymerase and the pressure exerted by the host immune system, has driven the evolution of HCV towards the development of a global diversity that revealed the existence of seven genetic lineages (genotypes 1 to 7) (Figure 2). On average, the complete genome differs in 31%-33% of nucleotide sites. Genotypes 1 to 6 of HCV contain a series of more closely related sub-types (67 accepted subtypes and many more to be confirmed) that typically differ from each other by at least 15% in nucleotide positions within the coding region. Subtypes 1a, 1b and 3a are widely distributed and account for the vast majority of infections in Western countries.
Figure 2 Evolutionary tree of the seven genotypes and all known subtypes of hepatitis C virus.
The tree was constructed using the maximum likelihood method using GTR + I + G (general time-reversible substitution model considering invariable sites and gamma distribution) as the nucleotide substitution model that best fitted the data using a 307-nucleotide sequence from the NS5B-coding region. Sequences used for the construction of this phylogenetic tree were extracted from Yusim et al.
HCV genetic variability is not evenly distributed across the viral genome. The regions of the genome that correspond to essential viral functions (such as those involved in translation and replication) or those with major structural domains (5’-NCR and 3’-NCR) are the most conserved. The 5’-NCR region is the most conserved region of the genome with 90% sequence identity between distant strains[25,26]. The region encoding the viral capsid is also highly conserved with 81%-88% sequence identity between different isolates. The most variable region of the genome is the one that codes for the membrane glycoproteins E1 and E2. The sequences belonging to the hypervariable regions 1 and 2 (HVR1 and HVR2) of E2 gene are the ones that show less sequence homology with only 50% identity between different isolates. Factors that may contribute to high genetic variability of these viruses include large population sizes, short generation times and high replication rates.
An important breakthrough in the treatment of chronic HCV infection was undoubtedly the introduction of alpha interferon (IFN-α) plus RBV as combination therapy. However, the rate of sustained virological response is still unsatisfactory[30,31], particularly in patients infected with genotype 1, the most prevalent in many geographic regions of the world[4,33]. Although IFN-α is effective in reducing the viral load, complete eradication of the virus is achieved in less than 20% of patients treated with IFN-α alone. In those patients who initially respond to IFN-α, ribavirin helps increasing the frequency of virus eradication, yet its effect on non-responder patients is still limited. Although viral genotype and viral load, as well as serum HCV RNA clearance during therapy are definitely related to response, further insight into viral factors involved in therapeutic responsiveness is still necessary. The different genotypes and subtypes vary in their responses to treatment with INF-α or INF-α/RBV. As mentioned above, only 10%-20% of individuals chronically infected with HCV genotype 1 treated with IFN-α monotherapy and 40%-50% of those treated with combination therapy (INF-α/RBV) exhibit a complete and permanent disappearance of the virus. These percentages are lower than rates of 50 and 70%-80%, respectively, observed in the treatment of patients infected with HCV genotypes 2 or 3[3,34]. Despite these facts, the use of INF-α-peg has been associated with a significant increase in these rates.
Two inhibitors of the NS3/4A serine protease, boceprevir (BOC) and telaprevir (TVR) have demonstrated potent inhibition of HCV genotype 1 replication and markedly improved SVR rates in treatment-naïve and treatment-experienced patients. Nowadays the NSOC therapy for genotype 1, chronic HCV infection is the use of BOC or TVR in combination with INF-α-peg and RBV[37-40]. Additionally, two other direct antiviral drugs (DAAs), simeprevir (protease-inhibitor) and sofosbuvir (nucleotide analogue of the NS5B RNA-polymerase) have been recently approved for triple therapy in the United States and Europe. Despite these new advances in what respects to triple therapy, it is worth to note that a wide range of different DAAs are currently under clinical trials aiming at all-oral IFN-free regimens[43-47].
MUTATIONS AND QUASISPECIES DYNAMICS
HCV evolution is a highly dynamic process. Like most RNA viruses, HCV exploits all possible mechanisms of genetic variation to ensure its survival. Mutation at the nucleotide level seems to be the main cause of genetic variation in RNA viruses, such as HCV. These mutations are primarily generated by an error-prone, non-proofreading RNA-dependent RNA-polymerase which directs the replication of the virus genetic material[1,49]. The mutation rate of HCV, estimated at 10-4 substitutions per site and round of replication, is among the highest for RNA viruses including retroviruses, and would seem to be high enough to generate all the genetic variation found in this virus. Due to this feature and to the high replication rate of HCV, a large number of different but closely related viral variants are continuously produced during infection. These circulate in vivo as a complex population commonly referred to as a quasispecies[52-58].
The idea of quasispecies was first used by Eigen et al[59,60] to refer to the first self-replicating structures. Originally conceived as a mathematical framework formulated to explain the evolution of life in the ‘‘pre-cellular RNA world”, quasispecies theory is based on classical population genetics, but seeks to explore the consequences of error-prone replication and near-infinite population sizes for genome evolution. More recently, quasispecies theory has been used to describe the evolutionary dynamics of RNA viruses. These structured populations also possess a high mutation rate which would lead to the existence of a complex mixture of different but related genomes that would behave as a selection unit (Figure 3). At a particular point of infection, the HCV quasispecies viral distribution reflects the balance between the continued generation of new variants, the need to preserve essential viral functions and the positive selective pressure exerted by the environment. It is important to highlight that multiple viral quasispecies co-exist in infected individuals at different replicative sites and in consecutive times, which offers a rich environment for intra and inter quasispecies interactions.
Figure 3 Viral quasispecies.
A virus replicating with a high mutation rate will generate a diverse mutant repertoire over the course of a few generations. In this schematic representation, a “parental” viral genome (black filled circle) gives rise to different variants (coloured squares, prisms and stars), each linked to another one by a point mutation. The concentric circles represent replication cycles. The resulting distribution is often referred to as quasispecies “Cloud”.
The theory of quasispecies predicts that a particular mutant surrounded by a more favourable mutant spectrum (more related) can dominate another one with better fitness. This phenomenon has been called suppressive effect. This suppressive or interfering effect of the mutant spectrum rises along with the mutation rates in viral quasispecies, as has been suggested by the strong suppressive effect in infectivity that viral populations have near to extinction. It has been proposed that interference generated by defective genomes contributes to viral extinction due to an increased mutagenesis rate. We can also see the opposite effect: genome complementation between different components of the mutant spectrum, which demonstrates that the behaviour of quasispecies cannot be understood solely as the sum of individual behaviours. Because of this, quasispecies is defined as a dynamic distribution of genomes subject to variation, competition and selection, and that act as a selection unit. This means that the quasispecies as a whole, instead of a particular viral variant, is the target of the selection process[62,64,66].
As mentioned before, low fitness variants can be preserved at higher than expected frequencies just because they are coupled to a well represented higher fitness genotype in sequence space. One of the defining characteristics of a quasispecies is the phenomenon of mutational coupling, as it places individual mutants within a functional network of variants. The elevated mutation rates in RNA viruses mean that a fast replicator will generate genetically diverse progeny, many of which will be significantly less fit than the parent. As a result, quasispecies theory predicts that slower replicators will be favoured if they give rise to a fitter progeny.
One of the consequences of quasispecies dynamics is the existence of a threshold error to the preservation of genetic information. When the error rate exceeds a tolerable limit (related to genome size, fitness and population size of the quasispecies) distribution collapses and the nucleotide sequence loses its information. This transition is known as the entrance to catastrophic error and its application to viral extinction through mutagenesis is called lethal mutagenesis. There are experimental evidences that show that RNA viruses replicate very closely to this threshold error and that an increase in mutation rates can have a negative impact on the viability of viral populations.
The theory of evolution predicts that, in a dynamic environment, high mutation rates are favoured, and therefore viral error rates may have been optimized by natural selection[68,69]. For RNA viruses, low replicative fidelity generates a diverse population of variants. Even though many of these variants are generally less fit, they may take over if an unexpected change in environment occurs, such as immune pressure, shifting the corresponding fitness landscape. On the contrary, a homogeneous population, generated by high replicative fidelity, would lack this flexibility and might be less successful in the dynamic host environment. Experimental support for this hypothesis has been provided by two different groups[70,71]. They isolated a poliovirus variant, resistant to ribavirin, which had a single amino acid substitution in the viral polymerase. This mutant exhibited a moderate resistance to lethal mutagenesis and assays for selectable markers indicated that this population had a lower mutation rate and it consequently displayed less genetic diversity. More recently, Vignuzzi et al (2008) proposed that this attenuated high fidelity variant could be employed for vaccine development.
High mutation rates and quasispecies dynamics confer great adaptability to RNA viruses and represent one of the major obstacles for the control and prevention of RNA viral diseases[15,73]. The great quasispecies variability of HCV has also therapeutic implications since, by means of generating and selecting fitter variants within the quasispecies cloud, viruses might escape control by antiviral drugs. The way HCV quasispecies evolves is highly dynamic, and for this reason, the complexity of the genetic information gathered from quasispecies populations cannot be accurately analysed by a unique analytical tool. The three parameters most commonly used to determine the complexity of the quasispecies mutant spectrum are: mutation frequency (defined as the proportion of mutated nucleotides within a genome distribution relative to the consensus sequence), Shannon entropy (defined as the proportion of different genomes within a mutant distribution) and Hamming distance (defined as the number of mutations that differentiate two sequences within the mutant spectrum)[27,64]. The average of all the comparisons between possible pairs reflects the genetic complexity of quasispecies. As stated previously, a fundamental feature of viral quasispecies, predicted from quasispecies theory, is that the target of selection is the mutant distribution as a whole rather than an individual genome. Selective transmission of predominant and minor HCV quasispecies has been shown in humans[76-80] and in experimentally infected chimpanzees[81,82]. Further understanding of quasispecies dynamics in infected individuals is necessary to gain knowledge on how to apply virus-specific drugs and to identify key parameters that are critical for the development of effective antiviral strategies. The genetic variability within quasispecies level has been frequently used as a predictor of the response to antiviral therapy. Several studies have reported that genetic diversity within different regions of HCV genome before combined therapy with INF/RBV was higher in non-responders than in responders[53,83-86]. Nevertheless, Cristina et al (2007) showed that the response to antiviral therapy is independent from genetic variability within quasispecies populations at the beginning of therapy. Ueda et al (2004) reported that during combined therapy with INF/RBV, HCV quasispecies significantly decreased in non-responders. However, another study showed that populations fluctuate along therapy, in treated patients that achieved a sustained virological response as well as in those non-responders. This would indicate that these fluctuations are intrinsic to each patient and that HCV follows different evolution paths in different patients. The variations in the results obtained by the different research groups might be caused by different methodological approaches used to study the real variability within circulating quasispecies populations. Another explanation to this might be that in viral populations with a high degree of genetic variability, the probability of finding variants that would replicate effectively in the presence of antiviral drugs is high. Despite this, those populations with lower variability degree might also present these types of resistant variants, though in lower probability. Likewise, given the stochastic nature of this phenomenon, the opposite situation might as well occur in these two same cases. These correlations can probably mask the expected one between genetic variability and response to treatment, thus providing no conclusive results on these matters.
In addition to the multiplicity of viral genetic factors reported so far, host factors have also shown to be involved in the development of HCV infection. Since 2009, several single nucleotide polymorphisms (SNP) have been reported near the interleukin 28B gene (IL-28B) that influence response to dual antiviral therapy[89-94]. However, it was not until 2011 that the relationship between HCV quasispecies diversity and the host IL-28B genotype was investigated. Their results evidenced a clear association between IL-28B risk allele (SNP rs8099917 - G allele) and a lower NS3 protease amino acid quasispecies diversity in infected patients, hence suggesting that IL-28B risk allele carriers exert less positive selection pressure on the NS3/4 protease. However, it would be interesting to address whether the lower amino acid quasispecies diversification in patients with an IL-28B risk allele is restricted to the NS3 coding region or if it affects other viral genomic regions. Interestingly, also unfavourable rs8099917 genotypes were found linked to time-dependent changes in the core coding region, specifically the shift to residue 70Q associated with hepatocellular carcinoma. These results might indicate that the IL-28B genotype influences viral evolution and disease outcome in addition to the behaviour of the innate immune system. More recently, Yuan et al (2012) found that, despite exhibiting similar baseline viral loads, more chronically-infected patients with the rs12979860-CC polymorphism had amino acid substitutions in NS5A compared to non-CC patients. This result suggests that patients with the CC genotype undergo early viral evolution probably as a consequence of the selective pressure exerted by the use of interferon at the beginning of treatment. These studies raise the question of whether host genetics shapes viral evolution in response to immunity, and if the differences observed in the evolution of HCV quasispecies are implicated in the mechanism by which the IL-28B genotype influences the outcome of acute HCV infection and treatment response.
As previously mentioned, RNA virus populations exist as a cloud of sequence variants (continuously being generated by mutation) strongly related, therefore a better understanding of these populations within infected individuals is needed in order to apply antiviral strategies and to define critical parameters to the development of new and more effective antiviral therapies. Nevertheless, to date, our ability to know in depth the quasispecies cloud from the study of isolated clones is limited. The recent techniques of deep sequencing allow us, for the first time to overcome these obstacles and observe quasispecies as a whole. These techniques are already being used on clinical samples to study different viral models[98-100].
NEXT GENERATION SEQUENCING IN HCV QUASISPECIES ANALYSIS
Since 2005, the development of high throughput, or so-called next generation sequencing technologies (NGS), has allowed a huge increase in capacity to sequence genomes at a reasonably low cost and in a short time frame. NGS comprises a set of high-throughput sequencing technologies, which make it possible to sequence several genomes from individual templates in a parallel fashion. The current NGS technologies are known as second generation technologies (Table 1), to distinguish them from the first generation (Sanger sequencing), and the third generation (based on single molecule sequencing).
Table 1 Representative next generation sequencing platforms and their characteristics.
Run time (h)
Read length (bp)
Throughput per run (Mb)
Main biological applications
Roche 454 FLX +
700, up to 1000
Insertions/deletions (indels) at homopolymer regions
Microbial genome sequencing, human genome sequencing, transcriptomics, metagenomics
As it is mentioned before, RNA viruses have very high mutation frequencies. The error rate of viral RNA-dependent RNA polymerases is estimated to be between 10-3 and 10-6 per nucleotide copied compared to 10-8-10-11 for DNA polymerases. Consequently, an RNA virus population consists not of a single genotype but of an ensemble of closely related genotypes, termed as a quasispecies, centred on a master sequence. This genetic diversity creates a cloud of potentially beneficial mutations, which is thought to allow rapid adaptation to a constantly changing environment. Quasispecies theory makes a number of predictions about the behaviour of viral populations and the consequences of altering diversity.
NGS technologies have redefined the modus operandi in virus genetics research, allowing the unprecedented generation of very large sequencing datasets on a short time scale and at affordable costs. A significant technical challenge to address how we can measure viral diversity could be NGS technology. Ultra-deep sequencing has the sensitivity and quantitative nature required for this kind of investigations into viral genetic drift, natural selection and response to antiviral drugs. The analysis of viral quasispecies has been greatly enhanced by the recent emergence of these powerful technologies that allow the simultaneous sequencing of 400000-10000000 individual target sequences.
As previously reported, drug-resistant variants may already exist previous to a particular antiviral treatment embedded in a predominantly wild-type virus quasispecies population. Consequently, as they may be present at different low-level frequencies, this may lead to varying degrees of viral response and therefore the mutant genome will become enriched upon treatment with HCV inhibitors. Hence, determining the natural levels of low frequency resistant variants before starting a treatment might be relevant to better predict viral response to HCV inhibitors.
The classic method for detecting these drug-resistant variants in infected patients is population-based DNA sequencing, in other words: bacterial cloning. This method provides a good idea of the major sequences present, but unfortunately it cannot detect minor variants that are present at a frequency below 20%-25%. With the development of deep sequencing technologies, detection of drug-resistant variants became more sensitive allowing the identification of variants present at very low frequencies (about 0.1%-1%)[107-109]. For HCV, the deep sequencing method was first used to detect emergence of NS3 mutants. This way it was clearly demonstrated that de novo telaprevir-resistant NS3 mutants arose in mice injected with wild-type HCV only 2 wk after the beginning of treatment. Deep sequencing was also used to confirm results of naturally occurring drug-resistant HCV mutants detected by a novel mismatch amplification mutation assay polymerase chain reaction.
As can be noted, deep sequencing technology provided a comprehensive view of the viral population dynamics during monotherapy with NS3 protease inhibitors. It allowed the estimation of pre-treatment levels of NS3 drug resistant mutants, suggesting therefore a limitation of HCV viral suppression with NS3 protease inhibitor monotherapy caused by the pre-existence of drug resistant mutants. This finding strongly evidences the need for a combined therapy to durably treat HCV infection.
In addition, NGS technology has already been implemented in order to study the transmission event of HCV among injection drug users. In this case NGS was used to determine intra-host viral genetic variation by deep sequencing the HCV hypervariable region, which allowed a detailed analysis of the structure of the viral quasispecies in the patients’ population under study.
NGS approaches are powerful methods that allow a rather comprehensive analysis of the intra-host viral genetic variation. Moreover, these technologies are becoming rapidly accessible all around the world which will likely revolutionise the field of molecular epidemiology.
It is worth mentioning that these technologies offer, as already discussed, several advantages over conventional methods, such as consensus sequencing, bacterial cloning, and endpoint limiting dilution. Furthermore, as the development of a variety of software and algorithms capable of handling the massive amount of data generated by NGS platforms is increasing in parallel with the advances in these technologies, it will likely expedite the implementation of such approaches in a variety of settings in the near future. Taking this into consideration, the use of NGS in HCV outbreak investigations will presumably improve molecular epidemiology studies as well as provide a vast amount of information that will need to be handled appropriately both for the benefit of infected patients and the management of public health systems.
In addition to mutations, it is widely accepted that recombination plays an important role in the evolution of RNA viruses by creating genetic variation through the exchange of nucleotide sequences between different genomic RNA molecules. Therefore it is considered as a key mechanism for the production of new genomes with selective growth advantage. Homologous recombination occurs, in some cases of RNA virus recombination, when the donor sequence neatly replaces a homologous region of the acceptor sequence leaving its structure unchanged. In these cases, not only the parental RNAs are homologous, but also crossovers occur at homologous sites[112,113]. Nevertheless, hybrid sequences may originate as a result of aberrant homologous recombination (when similar viruses exchange sequence without maintaining strict alignment) and non-homologous recombination (recombination between unrelated RNA sequences)[112,113]. RNA recombination involves replication of genomic RNA as a necessary component of the process. If in the middle of the replication process, the viral RNA-dependent RNA polymerase complex switches from one RNA parental strand to another, hybrid complementary RNA strands will be formed. Yet, if the replicase continues to copy the new strand at the same site where it left the parental one, this constitutes a homologous recombination event. On the other hand, aberrant or non-homologous recombination will occur if the copying process is not as precise. This template strand exchange mechanism of recombination is known as “copy choice”. The exact mechanism of exchange of strand is not known, but could be promoted by the pause of the polymerase during chain elongation. Thus far, nearly all studies on the mechanisms of recombination in RNA viruses support a copy-choice model, originally proposed for poliovirus. It is of note, that this template-switching mechanism greatly differs from the enzyme-driven breakage-rejoining mechanism of homologous recombination in DNA, mainly because it resorts to replication as an essential step of the process.
Recombination in RNA virus was first discovered in poliovirus infected cells in which the frequent recovery of poliovirus that results from recombination has the potential to produce “escape mutants” in nature as well as in experiments. Subsequently, recombination was found to occur in other RNA viruses positive and negative-stranded and more recently, recombination between unrelated groups of RNA and DNA viruses was discovered in a novel virus genome isolated from an extreme environment. The presence of recombination in several members of the family Flaviviridae such as Pestivirus, Flavivirus and Hepacivirus has been demonstrated[117-122].
Regarding HCV, recombination has been reported both inter and intragenotypic in populations in different geographic locations (Table 2). Some earlier reports described some HCV strains from Honduras in which the study of partial sequences from different regions of the viral genome resulted in HCV discordant genotype, providing first evidence for the possible existence of HCV recombination. However it was not until 2002 that the first convincing report of an intergenotypic HCV recombinant strain was published by Kalinina et al (2002) in Saint Petersburg (Russia)[49,124]. These authors described six different natural HCV strains that belonged to different subtypes, 2k and 1b. They found that the 5’ untranslated region and the core coding region belonged to subtype 2k, whereas the NS5B region corresponded to subtype 1b. Sequencing the E2-p7-NS2 region, they were able to map the crossover point within the NS2 region, estimating it most likely between positions 3175 and 3176 (according to the numbering system for strain pj6CF). The reported recombinant was cautiously designated RF1_2k/1b, in agreement with the nomenclature used for human immunodeficiency virus (HIV) recombinants. This same recombinant strain has since then been isolated in other countries, like Ireland, Uzbekistan, Cyprus, France and Estonia which would suggest that, although its generation might not be favoured by natural selection, it would also not be selected against. Additionally, at least ten other different intergenotypic recombinant forms (RFs) of HCV have been described and are totally or partially characterised (Table 1). Within this group of recombinants, we can observe the presence of recombinant forms between genotypes 2 and 6 described in Vietnam and Taiwan, between genotypes 2 and 5 described in France, between genotypes 3 and 1 in Taiwan and China, and between genotypes 2 and 1 reported in Japan, the United States and the Philippines[134-137]. It is interesting to note that recombinant forms found so far have a wide geographic distribution. Besides, all HCV genotypes except for genotype 4 and 7 have been found in them. Other interesting feature is that all these recombinants but one (RF_3a/1b) originated by the combination of a 5’-end of genotype 2 and a 3’-end of a different genotype. The 3’-end of subtype 1b seems to be the only one appearing in more than one recombinant form. Oddly enough, genotype 2 is present in the majority of the recombinants found to date, which might suggest a critical role in order for the process to take place or even for the stability and functionality of the resulting recombinant genome. The fact that some recombinants involving genotype 2 and subtype 1b have been frequently found in older patients and in cases not usually related to the epidemic spread associated to a higher use of intravenous drugs, makes it difficult to assess whether this pattern derives from adaptive selection or is simply due to chance[1,49]. As we can observe in Table 2, another characteristic feature of intergenotypic recombination in HCV seems to be that the crossover points appear to be located within either gene NS2 or NS3. An apparent hotspot has been identified between amino acids 1022 and 1042 (corresponding to the vicinity of the NS2/NS3 junction). Considering how short this region seems to be, and despite the existence of only a few reports inquiring into RNA secondary structures involved in the recombination process[130,139], all seems to indicate that a copy choice mechanism might be responsible for the generation of these recombinant forms.
Table 2 Main features of intergenotype, intersubtype and intrapatient recombination in hepatitis C virus published cases.
Russia, Ireland, Uzbekistan, Georgia, France, Cyprus, Estonia
NS2/NS3 junction, position 3429 Undetermined Undetermined
NS2/NS3 junction, positions 3405-3416
NS5B, position 8321
Core, at position 387
2 sites in E1-E2, at positions 1407 and 2050
5 sites, from core to NS3, at positions 801 1261, 2181, 3041 and 3781
NS5B, between positions 8345-9073
NS5B, between positions 8358-8977
NS5B, between positions 8372-9033
NS5B, between positions 8356-9019
NS5B, at residue 286
1a, 1b, 3a
1 or 2 sites within E1-E2 or NS5A
E2 glycoprotein, HVR1 region
Modified from ref. .
With respects to intragenotypic recombination, nine recombinant forms have been described. As we mentioned before, each of the six major genotypes of HCV (except for genotype 7 for which there is only one complete-genome sequence available) can be subdivided into closely related sub-types that differ from each other by at least 15% in nucleotide sequences. Therefore, the same methodological procedures based on phylogenetic incongruence used to detect intergenotypic recombinants are also applicable to detect intragenotype/intersubtype RFs. Only one of the intragenotype recombinants described involved genotype 4, four involved different subtypes of genotype 1 and the remaining four involved different subtypes of genotype 6. Examples of recombinants (1a/1b) have been identified in Peru as well as in Uruguay[111,141] and (1a/1c) in Japan and in India[142,143]. Interestingly, although the recombinants reported in Uruguay and Peru are (1a/1b), their recombination breakpoints were found in different regions of the genome as shown in Table 2. The same happens in the case of the recombinant forms described in India and in Japan. Therefore, unlike what it is observed in intergenotype recombination, where all shared a common genome region in which recombination occurs, in this case these points are highly variable in their location. Only two of the intragenotype 1 RFs have been fully sequenced[142,143] and both involve the same subtypes (1a and 1c). Interestingly, they revealed the existence of more than one cross-over point, resulting in mosaic recombinant forms. In addition, the sequences showed a very dissimilar size, exhibiting relatively short segments of one subtype embedded within a genome of the other subtype. The most recent description of intersubtype recombinant forms involves genotype 6, and these were identified thanks to full-genome sequence analysis. The remaining three cases of intersubtypic RFs reported have only been partially characterised at the genome level. One of the cases was detected by discordant phylogenetic analysis of the regions coding for E1 and NS5B isolated from an intravenous drug user from Portugal. This case should be considered as a putative example since no recombination breakpoint has been mapped for this RF. The two remaining cases are different as they have been described only by sequencing one single portion of the HCV genome. Contrary to what occurred with the other example, the corresponding breakpoints for these two were identified within the core and NS5B genes each[111,141], genes that are relatively conserved and are therefore suitable for phylogenetic typing and subtyping of HCV isolates.
When we talk about recombination in HCV we can not forget to mention the existence of intrapatient or also called intra quasispecies recombination. With respect to this matter, three reports have been published thus far. Two of them involve intrapatient recombination in individual patients undergoing therapy[145,146]. The most recent one involves quasispecies evolution from a chronically infected, treatment naïve individual. In the first report of intra-quasispecies recombination, analysed sequences were obtained from the NS5A gene of HCV quasispecies populations from six patients being treated with IFN + RBV combined therapy. Only one recombinant strain was detected in all patient quasispecies populations studied and its recombination crossover point was found within the protein kinase R (PKR)-binding region of NS5A. This region has a particular importance since previous work by Enomoto et al (1995) suggested that the genetic heterogeneity of the interferon sensitivity determining region domain of HCV NS5A (IFN sensitivity-determining region), linked to response to therapy in Japanese patients. Although there seems not to be a consensus on this issue, the published information supports the hypothesis that an association indeed exists between NS5A and response to therapy[149-151]. Some reports suggest that HCV NS5A protein can act in vivo repressing PKR function, and presumably allowing HCV to escape the antiviral effects of interferon[1,3,152,153]. Analysing NS5A protein sequences of both the recombinant and putative parental-like virus provided evidence in favour of the possibility that the recombinant isolate might have acquired amino acids already known for being present in HCV strains resistant to interferon treatment. The results of these studies support the fact that recombination cannot be denied as an evolutionary mechanism for generating diversity in HCV in vivo in patients undergoing antiviral therapy. In spite of this fact, recombination does not seem to play a major role in the evolution of HCV quasispecies populations; at least this is what can be extrapolated from the study of NS5A genes, since only one recombinant isolate was found among all HCV quasispecies populations studied. Contrary to this finding, Sentandreu et al (2008) identified a high frequency of intra-patient recombination events (18.01% of the 111 analysed patients) analysing a large data set of HCV sequences (around 17700) from intra-patient viral population. They retrospectively studied NS5A and E1-E2 coding regions from samples isolated from two different sets of groups: HCV mono-infected patients, both naïve and non-responders to antiviral treatment; and HCV/HIV co-infected patients, both treatment-naïve and under HAART. These authors found recombination within the E1-E2 region (9.1%), and within the NS5A region (9.6%), with specific areas being proposed as the crossover points. Although no structural analyses were performed in this study, these results are consistent with the implication of RNA secondary structure in favouring the hotspots or zones where recombination can occur within the HCV genome. As per these results, where recombination intra-patient was found in 18% of the HCV infected patients studied, intra-quasispecies recombination events seem relatively frequent. Moreover, this might be an underestimation of the real frequency of HCV recombination due to the difficulty in detecting recombination events if they occur between genetically very similar variants as is the case of variants within a quasispecies[49,146]. More recently, Palmer et al (2012) detected putative intra-subtype recombinants, as well as the likely ancestral parental donors. By retrospective clonal analysis they explored quasispecies evolution evaluated at the HVR1 region from serum samples isolated from the same chronically infected, treatment naïve individual which were collected over 9.6 years. Their detailed analysis clearly documents the emergence, maintenance and final removal of HCV variants, which in this case has been demonstrated in a patient who did not undergo antiviral therapy highlighting the importance of HCV quasispecies dynamics even in absence of a clear selective pressure.
Even though the regions apparently involved in the crossover events seem to be different when comparing intergenotype vs intraquasispecies recombination, they seem to be quite conserved within each of the different categories. This might represent another indication of the importance of RNA secondary structure for recombination events to take place and might as well hint at a possible factor determining their occurrence.
Having detected so many recombinant strains (Table 2) proves that HCV is capable of successfully completing all the steps leading to this event: simultaneous infection of the same cell by different viral strains, simultaneous replication of both viral genomes, strand shift by the viral RNA polymerase without disturbing the correct reading frame, and encapsidation and release of the recombinant genomes as viable viral particles. The resulting products will then be subjected to the same population processes governing the maintenance, growth or disappearance of new variants in a heterogeneous viral population[49,146].
If we analyse in depth the recombination events, we can consider that recombination in HCV may be underestimated. Why? Three different factors might account for this: Firstly, in recombination events between subtype viral strains, there is a trade-off between the capability of homologous recombination event to occur, and the intra-patient viral diversity, since homologous recombination requires a minimum length of sequence identity. Secondly, another trade-off occurs between the intra-patient viral diversity and the discrimination power to detect recombination with the different methods available. Finally, despite recombination events between different genotypes/subtypes co-infecting the same patient are probably easier to detect, they are less likely to occur, since the strains of different subtypes differ more between them than between those from the same subtype; this would imply a lower probability of template switching and moreover, if a recombination event does indeed happen, it will likely generate recombinant sequences less viable than the parental ones. Some or even all these factors acting in concert might explain why the frequency of recombinant HCV sequences reported to date is so low[49,111,124-137,140-147].
How relevant recombination for HCV long term evolution and its incidence in HCV infection is, has not been thoroughly investigated yet, but these findings support a potentially significant role for recombination by creating genetic variation through the reshuffling of independent variants. Recombination may serve two opposite purposes: simply to explore a new genomic combination or to rescue viable genomes from debilitated parental ones[29,154]. Considering that recombination may influence vaccine development, virus control programs, patient management as well as antiviral therapies, it is clearly important to determine the extent to which this mechanism plays a role in HCV evolution.
RNA viruses exist as complex mutant distributions commonly known as viral quasispecies. This is the result of high mutation rates due to the lack of proofreading activities in their RNA-dependent RNA polymerases. The evolution of the HCV quasispecies is a highly dynamic process that has therapeutic implications due to the continuous generation and selection of fitter variants within the quasispecies spectrum which might allow viruses to escape control by antiviral drugs and treatment[15,74]. Further studies on HCV quasispecies is needed in order to develop appropriate strategies for effective antiviral control. HCV utilizes all known genetic mechanisms, including mutation and recombination. NGS technologies may represent an important improvement to identify key parameters in our understanding of HCV evolution in relation to current and new therapies against HCV.
P- Reviewer: Bare P, Sayiner AA S- Editor: Ji FF L- Editor: A E- Editor: Liu SQ
Rosen HR, Gretch DR. Hepatitis C virus: current understanding and prospects for future therapies.Mol Med Today. 1999;5:393-399.
Keyvani H, Fazlalipour M, Monavari SH, Mollaie HR. Hepatitis C virus--proteins, diagnosis, treatment and new approaches for vaccine development.Asian Pac J Cancer Prev. 2012;13:5931-5949.
Hu KQ, Vierling JM, Redeker AG. Viral, host and interferon-related factors modulating the effect of interferon therapy for hepatitis C virus infection.J Viral Hepat. 2001;8:1-18.
Friebe P, Bartenschlager R. Genetic analysis of sequences in the 3’ nontranslated region of hepatitis C virus that are important for RNA replication.J Virol. 2002;76:5326-5338.
Kieft JS, Zhou K, Jubin R, Doudna JA. Mechanism of ribosome recruitment by hepatitis C IRES RNA.RNA. 2001;7:194-206.
Pestova TV, Shatsky IN, Fletcher SP, Jackson RJ, Hellen CU. A prokaryotic-like mode of cytoplasmic eukaryotic ribosome binding to the initiation codon during internal translation initiation of hepatitis C and classical swine fever virus RNAs.Genes Dev. 1998;12:67-83.
Le Guillou-Guillemette H, Vallet S, Gaudy-Graffin C, Payan C, Pivert A, Goudeau A, Lunel-Fabiani F. Genetic diversity of the hepatitis C virus: impact and issues in the antiviral therapy.World J Gastroenterol. 2007;13:2416-2426.
Poynard T, Marcellin P, Lee SS, Niederau C, Minuk GS, Ideo G, Bain V, Heathcote J, Zeuzem S, Trepo C. Randomised trial of interferon alpha2b plus ribavirin for 48 weeks or for 24 weeks versus interferon alpha2b plus placebo for 48 weeks for treatment of chronic infection with hepatitis C virus. International Hepatitis Interventional Therapy Group (IHIT).Lancet. 1998;352:1426-1432.
Ogata N, Alter HJ, Miller RH, Purcell RH. Nucleotide sequence and mutation rate of the H strain of hepatitis C virus.Proc Natl Acad Sci USA. 1991;88:3392-3396.
Zeuzem S. Heterogeneous virologic response rates to interferon-based therapy in patients with chronic hepatitis C: who responds less well?Ann Intern Med. 2004;140:370-381.
Laskus T, Wilkinson J, Gallegos-Orozco JF, Radkowski M, Adair DM, Nowicki M, Operskalski E, Buskell Z, Seeff LB, Vargas H. Analysis of hepatitis C virus quasispecies transmission and evolution in patients infected through blood transfusion.Gastroenterology. 2004;127:764-776.
Martell M, Esteban JI, Quer J, Genescà J, Weiner A, Esteban R, Guardia J, Gómez J. Hepatitis C virus (HCV) circulates as a population of different but closely related genomes: quasispecies nature of HCV genome distribution.J Virol. 1992;66:3225-3229.
Eigen M, Gardiner W, Schuster P, Winkler-Oswatitsch R. The origin of genetic information.Sci Am. 1981;244:88-92, 96, et passim.
Eigen M, Schuster P. The hypercycle. A principle of natural self-organization. Part A: Emergence of the hypercycle.Naturwissenschaften. 1977;64:541-565.
Domingo E, Martin V, Perales C, Grande-Pérez A, García-Arriaza J, Arias A. Viruses as quasispecies: biological implications.Curr Top Microbiol Immunol. 2006;299:51-82.
Pellerin M, Lopez-Aguirre Y, Penin F, Dhumeaux D, Pawlotsky JM. Hepatitis C virus quasispecies variability modulates nonstructural protein 5A transcriptional activation, pointing to cellular compartmentalization of virus-host interactions.J Virol. 2004;78:4617-4627.
Baccam P, Thompson RJ, Fedrigo O, Carpenter S, Cornette JL. PAQ: Partition Analysis of Quasispecies.Bioinformatics. 2001;17:16-22.
Cody SH, Nainan OV, Garfein RS, Meyers H, Bell BP, Shapiro CN, Meeks EL, Pitt H, Mouzin E, Alter MJ. Hepatitis C virus transmission from an anesthesiologist to a patient.Arch Intern Med. 2002;162:345-350.
Gretch DR, Polyak SJ, Wilson JJ, Carithers RL, Perkins JD, Corey L. Tracking hepatitis C virus quasispecies major and minor variants in symptomatic and asymptomatic liver transplant recipients.J Virol. 1996;70:7622-7631.
Manzin A, Solforosi L, Debiaggi M, Zara F, Tanzi E, Romanò L, Zanetti AR, Clementi M. Dominant role of host selective pressure in driving hepatitis C virus evolution in perinatal infection.J Virol. 2000;74:4327-4334.
Weiner AJ, Thaler MM, Crawford K, Ching K, Kansopon J, Chien DY, Hall JE, Hu F, Houghton M. A unique, predominant hepatitis C virus variant found in an infant born to a mother with multiple variants.J Virol. 1993;67:4365-4368.
Hijikata M, Mizuno K, Rikihisa T, Shimizu YK, Iwamoto A, Nakajima N, Yoshikura H. Selective transmission of hepatitis C virus in vivo and in vitro.Arch Virol. 1995;140:1623-1628.
Holmes EC, Worobey M, Rambaut A. Phylogenetic evidence for recombination in dengue virus.Mol Biol Evol. 1999;16:405-409.
Tolou HJ, Couissinier-Paris P, Durand JP, Mercier V, de Pina JJ, de Micco P, Billoir F, Charrel RN, de Lamballerie X. Evidence for recombination in natural populations of dengue virus type 1 based on the analysis of complete genome sequences.J Gen Virol. 2001;82:1283-1290.
Twiddy SS, Holmes EC. The extent of homologous recombination in members of the genus Flavivirus.J Gen Virol. 2003;84:429-440.
Uzcategui NY, Camacho D, Comach G, Cuello de Uzcategui R, Holmes EC, Gould EA. Molecular epidemiology of dengue type 2 virus in Venezuela: evidence for in situ virus evolution and recombination.J Gen Virol. 2001;82:2945-2953.
Worobey M, Holmes EC. Homologous recombination in GB virus C/hepatitis G virus.Mol Biol Evol. 2001;18:254-261.
Colina R, Casane D, Vasquez S, García-Aguirre L, Chunga A, Romero H, Khan B, Cristina J. Evidence of intratypic recombination in natural populations of hepatitis C virus.J Gen Virol. 2004;85:31-37.
Ross RS, Verbeeck J, Viazov S, Lemey P, Van Ranst M, Roggendorf M. Evidence for a complex mosaic genome pattern in a full-length hepatitis C virus sequence.Evol Bioinform Online. 2008;4:249-254.
Witherell GW, Beineke P. Statistical analysis of combined substitutions in nonstructural 5A region of hepatitis C virus and interferon response.J Med Virol. 2001;63:8-16.
Gale M, Blakely CM, Kwieciszewski B, Tan SL, Dossett M, Tang NM, Korth MJ, Polyak SJ, Gretch DR, Katze MG. Control of PKR protein kinase by hepatitis C virus nonstructural 5A protein: molecular mechanisms of kinase regulation.Mol Cell Biol. 1998;18:5208-5218.
Pawlotsky JM, Germanidis G, Neumann AU, Pellerin M, Frainais PO, Dhumeaux D. Interferon resistance of hepatitis C virus genotype 1b: relationship to nonstructural 5A gene quasispecies mutations.J Virol. 1998;72:2795-2805.
Costa-Mattioli M, Ferré V, Casane D, Perez-Bercoff R, Coste-Burel M, Imbert-Marcille BM, Andre EC, Bressollette-Bodin C, Billaudel S, Cristina J. Evidence of recombination in natural populations of hepatitis A virus.Virology. 2003;311:51-59.
Yusim K, Richardson R, Tao N, Dalwani A, Agrawal A, Szinger J, Funkhouser R, Korber B, Kuiken C. Los alamos hepatitis C immunology database.Appl Bioinformatics. 2005;4:217-225.