A hallmark of hepatitis B virus (HBV) replication is protein-primed reverse transcription, related to, but mechanistically distinct from, retroviral replication. This review will focus on the various genome transformations that occur during the replication cycle, with emphasis on the cis-elements on the one hand, and the trans-acting factors known or thought to be involved on the other. A general outline of the chain of events during hepadnaviral replication has been established, mainly using transfection of cloned HBV DNA into a few suitable cell lines, and has been the subject of several reviews[1-6]. Much less clear than what happens is, however, how these various steps are achieved and regulated.
Several features of HBV make resolving such mechanistic questions exquisitely difficult: Foremost, until recently, no in vitro systems were available to reconstitute individual replication steps under controlled conditions; conversely, there are no simple infection systems to follow the consequences of mutation-induced in vitro phenotypes in the context of authentic virus replication. Secondly, for various of the viral components and mechanisms no precedents exist such that drawing conclusions in silico, or by analogy to experimentally more tractable systems, is of limited value. Finally, there is a general lack of structural information on the involved viral factors, in particular on P protein, the viral reverse transcriptase. Fortunately, duck HBV (DHBV), the type member of the avihepadnaviridae, provides a valuable model system. Although it differs from HBV in some details, the general features of genome replication are highly conserved; in fact, that hepadnaviruses replicate through reverse transcription has been established with DHBV. Beyond allowing feasible in vitro and in vivo infection studies, the crucial initiation step of DHBV reverse transcription has recently been reconstituted in the test tube. On many occasions, we will therefore refer to data obtained with this model virus, and add what is available for human HBV.
OVERVIEW OVER THE HEPADNAVIRAL GENOME REPLICATION CYCLE
Replication of the hepadnaviral genome can broadly be divided into three phases (Figure 1): (1) Infectious virions contain in their inner icosahedral core the genome as a partially double-stranded, circular but not covalently closed DNA of about 3.2 kb in length (relaxed circular, or RC-DNA); (2) upon infection, the RC-DNA is converted, inside the host cell nucleus, into a plasmid-like covalently closed circular DNA (cccDNA); (3) from the cccDNA, several genomic and subgenomic RNAs are transcribed by cellular RNA polymerase II; of these, the pregenomic RNA (pgRNA) is selectively packaged into progeny capsids and is reverse transcribed by the co-packaged P protein into new RC-DNA genomes. Matured RC-DNA containing-but not immature RNA containing-nucleocapsids can be used for intracellular cccDNA amplification, or be enveloped and released from the cell as progeny virions. Below we discuss these genome conversions, with emphasis on the reverse transcription step, and particularly its unique initiation mechanism.
Figure 1 Replication cycle of the hepadnaviral genome.
Enveloped virions infect the cell, releasing RC-DNA containing nucleocapsids into the cytoplasm. RC-DNA is transported to the nucleus, and repaired to form cccDNA (1). Transcription of cccDNA by RNA polymerase II (2) produces, amongst other transcripts (not shown), pgRNA. pgRNA is encapsidated, together with P protein, and reverse transcribed inside the nucleocapsid (3). (+)-DNA synthesis from the (-)-DNA template generates new RC-DNA. New cycles lead to intracellular cccDNA amplification; alternatively, the RC-DNA containing nucleocapsids are enveloped and released as virions. PM, plasma membrane.
RC-DNA TO cccDNA CONVERSION
Persistent viral infections require that the viral genome be present in the infected cell in a stable form that is not lost during cell division, and which therefore can be used for the continuous production of progeny genomes. Many DNA virus genomes harbor replication origins allowing them to directly exploit the cellular replication machinery for amplification; retroviruses integrate a terminally duplicated linear version of their DNA genome into the host genome, such that it is replicated along with the chromosomes. For hepadnaviruses, the genome persists, instead, as a nuclear, episomal covalently closed circle, i.e. the cccDNA. The circular form obviates the need for terminal redundancy in that, on the circle, the core promoter/enhancer II is placed in front of the start sites for the genomic RNAs; conversely, in typical HBV expression vectors the cloned hepadnaviral DNA is interrupted by plasmid sequences, such that the control regions need to be duplicated.
Distinct features of the RC-DNA (Figure 2) are (1), only the (-)-DNA strand (with opposite polarity to the mRNAs) is complete whereas the (+)-strands comprise a cohort of less than full-length molecules; (2), the 5´ end of the (-)-DNA is covalently linked to P protein; (3) the 5´ end of the (+)-strand consists of an RNA oligonucleotide, derived from the pgRNA, which served as the primer for (+)-strand synthesis. For cccDNA formation, all these modifications need to be removed, and both strands need to be covalently ligated.
Figure 2 HBV genome organization.
The partially double-stranded, circular RC-DNA is indicated by thick black lines, with P covalently linked to the 5´ end of the (-)-DNA, and the RNA primer (zigzag line) at the 5´ end of (+)-DNA. The dashed part symbolizes the heterogeneous lengths of the (+)-strands. DR1 and DR2 are the direct repeats. The outer circle symbolizes the terminally redundant pgRNA with ε close to the 5´ end, and the poly-A tail at the 3´ end. The precore mRNA is nearly identical, except it starts slightly upstream. The relative positions of the open reading frames for core (C), P, preS/S, and X are shown inside. TP, Terminal protein domain of P.
How this is achieved is not well understood, in particular because unambiguous cccDNA detection in the presence of excess RC-DNA is not technically trivial, not even by PCR approaches. Also, whenever overlength HBV constructs are involved, e.g. upon transduction of cells with HBV carrying adeno- or baculoviruses, caution is indicated because homologous recombination could provide for a virus replication independent mechanism of cccDNA formation.
Direct infection avoids this problem but as yet only limited data are available. Earlier evidence obtained with the DHBV-primary duck hepatocyte (PDH) infection system suggested that the activity of the viral P protein is not required for cccDNA generation; however, more recent data using infection with HBV of primary tupaia hepatocytes[14,15] indicate that reverse transcriptase inhibitors can strongly reduce, though not completely block, cccDNA formation. This would be in line with a role for P protein in the process, probably the completion of the (+)-strands. However, the other steps towards cccDNA are likely to require cellular activities, as suggested by the apparent lack of cccDNA, despite formation of infectious virions, in non-authentic host cells such as hepatocytes from HBV transgenic mice. However, low level cccDNA generation in mice has been reported in certain experimental settings[18,19].
Naturally hepadnavirus infected hepatocytes contain up to 50 or more copies of cccDNA, probably in the form of histone-containing minichromosomes. This amplification occurs intracellularly[22,23] in that progeny RC-DNA genomes, like the initially infecting genome, undergo nuclear import and cccDNA conversion (Figure 1). Together with the long half-life (30 to 60 d for DHBV[20,24]), this ensures that cccDNA is not lost during cell division, and even persists during effective antiviral therapy. Nuclear transport, and enveloped virion formation appear to be competing events, such that late in infection, when sufficient amounts of envelope proteins become available, further cccDNA amplification ceases. A recent analysis on the single cell level revealed that the cccDNA copy number in DHBV infected PDH is not uniform and fluctuates over time; about 90% of the cells contained between 1 and 17 copies, the rest more than 17 copies[27,28]. Cells with only one copy may allow segregation of daughter cells containing no cccDNA. This could explain the apparent, though probably less than complete, disappearance of cccDNA during spontaneous clearance of HBV infection. However, therapeutic cccDNA elimination from the chronically infected liver remains a major issue even with the latest generation antivirals[30,31]; further investigations into the mechanism of cccDNA formation, and possibly break-down, are clearly warranted.
FROM cccDNA TO PREGENOMIC RNA
All known hepadnaviral RNAs, i.e. the subgenomic RNAs as well as the greater-than-genome length pgRNA and precore RNA, are transcribed by cellular RNA polymerase II (the enzyme responsible for cellular mRNA synthesis) using cccDNA as the template. All contain 5´ cap structures, all are 3´ terminally poly-adenylated at a common site, and all serve as mRNAs for viral gene products. Spliced transcripts do exist, and even can be packaged into progeny virions[32,33], yet their functional role is still obscure although for DHBV splice site mutants appear to have defects. The extent of splicing appears to be controlled in DHBV by a long-range RNA secondary structure, and in HBV by the post-transcriptional regulatory element (PRE).
The transcript relevant for virus replication is the pregenomic RNA (pgRNA), encompassing the entire genome length plus a terminal reduncancy of, in HBV, about 120 nt that contains a second copy each of the direct repeat 1 (DR1) and the ε signal, plus the poly-A tail (Figure 3A). The pgRNA starts immediately after the precore initiator codon. Its first essential role is that as mRNA for the core protein and the reverse transcriptase; unlike retroviral Gag-Pol proteins, P is expressed as a separate polypeptide by an unconventional mechanism. Secondly, pgRNA is the template for generation of new DNA genomes by reverse transcription. The 5´ terminally extended precore RNA contains the initiator codon of the preC region and gives rise to the 25 kDa precore precursor protein of secreted 17 kDa HBeAg. It is unsuited as a pregenome, and is excluded from participating in replication on the level of encapsidation.
Figure 3 The pgRNA as substrate for P protein.
A: pgRNA organization. The pgRNA is shown with some major cis-elements, i.e. ε (hairpin structure), DR1, DR2, and DR1*; DHBV, but not HBV, requires for encapsidation an additional cis-element (Dε region II). Binding of P protein to 5´ ε, but not 3´ ε, initiates pgRNA encapsidation and replication; B: In vitro priming assay. P protein with its Terminal protein (TP), reverse transcriptase (RT), and RNase H (RH) domains can be activated by reticulocyte lysate (RL), or individual factors (X, Y, Z), to bind ε; ε may be supplied as small RNA covering just the hairpin structure. Upon addition of α32P-dNTPs, P uses ε as template to copy 3 to 4 nt from the ε bulge region; by the covalent linkage of the 5´ nt to a tyrosine residue in TP, P protein becomes radioactively labeled, providing a sensitive assay for activity. In vitro priming does thus far not work with human HBV P protein and ε RNA.
Of note, RNA polymerase II transcription could contribute, in addition to reverse transcription, to hepadnaviral genome variability. Its extent is, however, unclear because the error rate of RNA polymerase II is not firmly established and strongly affected by some of the subunits in the holo-enzyme complex[39,40].
CAPSID-ASSISTED REVERSE TRANSCRIP-TION OF pgRNA
The next crucial step in hepadnaviral replication is the specific packaging of pgRNA, plus the reverse transcriptase, into newly forming capsids. Key actors are cis-elements on the pgRNA, most notably the encapsidation signal ε, and P protein which binds to ε. This interaction, in a still poorly understood fashion, mediates recruitment of core protein dimers and thus leads to packaging of the pgRNA-P complex. Remarkably, the precore RNA is not packaged although it contains all of the sequence comprising the pgRNA. Likely, active translation from the upstream precore ATG through the ε sequence prevents the P-ε interaction. This implies, in turn, that P binding to ε on the pgRNA interferes with translation of the core ORF, and evidence supporting this view has been forwarded for DHBV.
Once pgRNA and P protein are being encapsidated a second key function of the P-ε interaction is brought to bear, namely the initiation of reverse transcription. At this stage the first DNA nucleotide (nt) is covalently linked to P protein, extended into a complete (-) strand DNA, and (+) strand DNA synthesis ensues, giving rise to a new molecule of RC-DNA; the various immature DNA forms in statu nascendi are termed replicative intermediates, visible as a broad multiple band pattern in Southern blots from intracellular DNA.
Completion of reverse transcription before leaving the cell marks another fundamental difference to retroviruses which, except for foamyviruses, are secreted as virions containing two copies of RNA; an evolutionary rationale may be that DNA synthesis in the infected cell allows hepadnaviruses the intracellular genome amplification cycle as an alternative to proviral integration for stable virus genome propagation. Yet another difference is that hepadnaviral reverse transcription occurs largely, if not exclusively, in intact nucleocapsids rather than in retrovirus-like reverse transcription complexes which typically lack a continuous core shell[43,44].
Although compartmentalization of the genome amplification process is emerging as a common theme of several classes of RNA and reverse transcribing viruses, the hepadnaviral strategy appears as an extreme variation, considering the space restrictions imposed on the replicating complex inside the geometrically defined capsid lumen. Either the RT must slide along the entire 3 kb template, or the template must be pulled through the RT´s active site; at the same time the nucleic acid is most likely in contact with the Arg-rich C termini of the core protein subunits. In fact, capsids from core protein variants lacking part of this region still package pgRNA but appear unable to produce full-length RC-DNA; instead they preferentially reverse transcribe the fraction of spliced genomic RNAs[46,47]. Phosphorylation/dephosphorylation events at the S and T residues in the nucleic acid binding domain clearly accompany genome maturation, and mutations affecting the core phosphorylation status[46,49] can influence DNA synthesis in various ways. Outside capsids, or in the absence of core protein, apparently no full-length DNA can be formed.
Together these observations support the view of the capsid as a dynamic replication machine. A recent cryo electron microscopic comparison between recombinant, bacterial RNA containing capsids and authentic genome harboring nucleocapsids indeed showed some structural differences; a complementary study on mutant recombinant cores revealed an enormous flexibility of the capsid structure. Switching between different structural states could well be involved in supporting the progress of DNA synthesis.
The heterogeneous lengths of the (+)-strand DNAs generated by capsid-assisted reverse transcription may result from a non-identical supply of dNTPs inside individual nucleocapsids at the moment of their enclosure by the dNTP-impermeable envelope. This predicts that intracellular cores produced in the absence of envelopment should contain further extended (+)-DNAs. Alternatively, space restrictions in the capsid lumen could prevent (+)-strand DNA completion; in this view, further (+)-strand elongation after infection of a new cell might destabilize the nucleocapsid and thus be involved in genome uncoating.
CIS-ELEMENTS AND TRANSACTING FACTORS ESSENTIAL FOR HEPADNA-VIRAL REPLICATION
The absolute requirements for replication are a template nucleic acid, plus an enzyme that is able to read the template information and use it for synthesis of a complementary nucleic acid. Clearly, these basic components are the pgRNA (and later the (-)-strand DNA) and P protein. However, generation of a functional genome also depends critically on precise start and end points, provided by cis-elements on the template (Figure 3A); a further specialty of P protein is its strict dependence, for activity, on cellular factors, namely heat shock proteins (Hsp´s) or chaperones (see below).
Most initial knowledge on the cis- and trans-factors involved in hepadnaviral replication was derived by transient transfection of mutant viral genomes into stable hepatoma cell lines. This system faithfully mimics several of the authentic replication steps, however its complexity precluded elucidation of many mechanistic details. The recent establishment of in vitro systems, culminating in the complete in vitro reconstitution of active DHBV replication initiation complexes from purified components[52-54], overcame these restrictions (see below). It should be noted, however, that such minimal systems have their own limitations; for instance, the crucial role of the proper capsid environment for RC-DNA formation has not yet been modeled in vitro, and HBV P protein has thus far proven refractory to in vitro reconstitution of DNA synthesis activity. Hence which of the two approaches is more useful depends on the question addressed, and where ever possible they should be combined.
STRUCTURE OF THE RNA ENCAPSIDATION SIGNAL ε
The best understood cis-element on the hepdnaviral pgRNA is ε (Figure 4), a stem-loop structure initially defined as the sequence from the 5´ end of HBV pgRNA that P-dependently mediated encapsidation of pgRNA, and also of heterologous transcripts to which it was fused; later, the P-ε interaction was found to constitute the first step in initiation of reverse transcription[56-58]; hence ε also acts as the replication origin (Figure 3A).
Figure 4 Secondary structures of hepadnaviral ε signals.
A: HBV ε. The entire hairpin, including in the upper stem region, is stably base-paired; formation of a stable tri-loop, as indicated, is confirmed by direct NMR analysis. The conventional annotation for the loop sequence is high-lighted by grey shading. The bulge-templated DNA oligonucleotide and the priming Y63 are indicated; B: DHBV ε. The overall 2D structure is similar to that of HBV ε, including a largely base-paired upper stem, as confirmed by preliminary NMR data. The boxed positions were randomized in a SELEX approach, and P binding individuals were selected from the corresponding RNA library; C: HHBV ε (Hε). Hε shows substantial sequence variation to D: (encircled nt), leading to a largely open upper stem structure; D: Generalized secondary structure of an avihepadnaviral ε signal. The scheme summarizes major determinants for productive interaction with P protein. Grey background indicates nt positions that are probably in contact with protein. The bulge and its immediate vicinity, particularly the tip of the lower stem, are essential for P binding whereas the loop appears critically involved in the transition to a productive initiation complex. U2590 > C or U2604 > G mutations abrogate, or strongly reduce, P binding whereas G2586 and C2594 do not affect binding but are important for priming. The major role of the nt termed N may be to provide a proper spacing to the bulge element; their sequence is not important as long as formation of highly stable base-pairing is prevented.
The hairpin structure of ε (Figure 4A) was confirmed by secondary structure analyses[59,60], and its importance was established by following the effects of site-directed mutants on the packaging efficacy in transfected cells[59-63]. Furthermore, the ε sequence is highly conserved in other mammalian hepadnaviruses, as well as between different HBV isolates[64,65]. An illustrating example are the HBV precore variants in which HBeAg synthesis is prevented by stop mutations in the ε overlapping precore region. The only mutations causing this phenotype found in nature are those which maintain a stable ε secondary structure[66,67].
It should be noted that RNA secondary structure (2D) analysis provides a mere description of the base-pairing pattern. Hence a true mechanistic understanding requires knowledge of the three-dimensional (3D) structure. The structure of the HBV ε upper stem, recently solved by nuclear magnetic resonance (NMR) techniques, revealed that the apical loop consists actually of only 3 nt (as in Figure 4A). This analysis is currently extended to determining the 3D structure of the entire stem-loop, which seems to form a nearly contiguous double-helix (S. Flodell, M. Petersen, F. Girard, J. Zdunek, K. Kidd-Ljunggren, J. Schleucher and S. Wijmenga; submitted); of particular interest will be the structure of the bulge region which is the template for the first few nt of (-)-DNA (see below). Thermodynamic calculations as well as experimental melting curves indicate that the entire HBV ε structure, including the upper stem, is highly stable. Notably, however, in DHBV rearranging this structure is necessary for the RNA to act as a template (see below). Determining the ε RNA structure in the complex with P protein is therefore the ultimate, yet demanding, goal.
Despite limited sequence homology to HBV ε, DHBV ε (Dε) has a similar secondary structure with a bulge and an apical loop (Figure 4B), which suggested that this structure is a common trait of all hepadnaviruses. Surprisingly, the corresponding ε signal (Hε) from the related heron HBV (HHBV) has much reduced base-pairing in the upper stem region (Figure 4C), yet it functionally interacts with DHBV P protein in vitro whereas HBV ε does not.
Selection, from a library of RNAs with partially randomized upper stems for individuals able to bind to in vitro translated DHBV P protein (see Cell-free reconstitution of hepadnaviral replication initiation) revealed the absence of base-pairing in the upper stem region as a common theme (Figure 4D). Some of the selected P-binding RNAs supported in vitro priming while others did not, confirming that a productive interaction, leading to DNA synthesis, requires more than mere physical binding (see below). Hence for avihepadnaviral ε signals an open upper stem structure is beneficial for both physical and productive binding to P; in fact, deliberate Dε stabilization strongly reduces P binding. This is one line of evidence that structural reshaping of the upper stem is a crucial event for initiation of DNA synthesis. Independent support comes from preliminary NMR and melting curve data for wt Dε RNA, according to which the upper stem is the least stable region (F. Girard, O. Ottink, M. Tessari and S. Wijmenga, to be submitted). This marked difference to the highly stable HBV ε structure may be related to the in vitro inactivity of the HBV P protein.
The functional consequences of mutations affecting ε and other recognized cis-elements are discussed in more detail below. Of note is, however, whereas the HBV ε stem-loop alone is sufficient to mediate encapsidation of heterologous RNAs[62,73], DHBV pgRNA encapsidation requires additional elements. “Region II”, several hundred nt downstream of 5´ ε, and the intervening sequence may be operative via long-range RNA interactions; why a similar element is dispensable for HBV pgRNA packaging is unclear.
Even HBV ε, however, does not act as a completely autonomous encapsidation element. The 3´ copy of ε cannot substitute for 5´ ε in the context of otherwise authentic pgRNA (although its DHBV counterpart is useable for DHBV P interaction in in vitro translation;). Furthermore, 5´ ε mediates encapsidation of heterologous RNAs only up to a limited distance from the 5´ end (about 65 nt), and seems to require the 5´ cap structure. Hence the 5´ cap and factors bound to it appear to have a role in the process, possibly in concert with the 3´ poly-A tail and its associated cellular proteins; this may explain why attempts to reconstitute encapsidation by simultaneous in vitro translation of P and core protein from uncapped, non-poly-adenylated RNAs in reticulocyte lysate (RL) were thus far unsuccessful.
An unexplored issue is whether the ε hairpin, or other base-paired regions on the pgRNA, would be substrates for cellular dsRNA recognizing systems such as Toll-like receptor 3 (TLR3) or retinoic acid inducible gene I (RIG-I) which play important roles in innate immune responses against viral infection[81,82], or for enzymes involved in processing of cellular hairpin RNAs such as the microRNA precursors. Direct screens identified a 65 kDa nuclear protein of unknown sequence and, more recently, a novel large RNA-binding ubiquitin ligase, hRUL138, as potential cellular ε RNA interaction partners. However, their roles in the viral life-cycle are not known. Also, any cellular ε-binding factor would have to cope with the physical sequestration of pgRNA into nucleocapsids.
P PROTEIN STRUCTURE
The P ORF, covering nearly 80% of the hepadnaviral genome (Figure 2), has a coding capacity of about 830 aa for HBV, and about 790 aa for DHBV (Figure 5). There is no indication for downstream processing of the primary translation products. In further contrast to retroviruses, hepadnaviral virions contain probably just one P protein molecule per particle, in line with the covalent linkage to the genome. In transfected cells, P appears to be produced in excess, such that most molecules are not capsid associated[87-89]; they have a short half-life, and are bound to large cytoplasmic structures; their function, if any, is unclear.
Figure 5 Domain structure of P protein.
A: Authentic DHBV P protein. Numbers are aa positions for DHBV P protein. The priming Tyr residue Y96 is indicated; B: Typical recombinant P protein construct. For solubility, a heterologous solubility enhancing domain (SED) such as NusA, GrpE, or GST is required, and a short stretch of C terminal aa must be removed. Deletion of the spacer has no negative effects on in vitro activity; C: Mini-RT2. This heavily truncated recombinant DHBV P protein requires mild detergent, but no chaperones for priming activity.
Bioinformatic and genetic analyses showed the presence in all P proteins, of two conserved domains, namely the polymerase/reverse transcriptase (RT) domain, and the C-terminal RNase H (RH) domain. Both are necessary as structural components for pgRNA encapsidation. An absolutely hepadnavirus-specific feature is, however, the Terminal Protein (TP) domain at the N terminus, separated from the RT domain by a highly variable, and dispensable, spacer. TP was first identified as the (-)-DNA linked protein and later was shown to provide a specific Y residue to which the first nt of the (-)-DNA becomes covalently linked (Y96 in DHBV TP;[93,94]; Y63 in HBV TP).
At present, no direct structural information on any hepadnaviral P protein is available although homology-based models for the RT and RH domain have been proposed. The RT model is in accord with drug resistance data, and it is supported by mutational analysis of the putative dNTP pocket of DHBV P protein where a single aromatic residue (F451) was shown to have a homologous role in dNTP versus rNTP discrimination as Y115 in HIV-1 RT; replacement of the bulky F451 by smaller residues conferred to the protein a low but clearly detectable RNA polymerase activity. However, outside the active site the accuracy of the modeled structure is unknown. Hence direct structure determination of the RT and RH domains remains a major objective.
This holds even more for the TP domain, which shares no significant sequence similarity to any other protein in the data base, not even to the few other terminal proteins involved in viral genome replication, such as the VPg in picornaviruses, or the terminal proteins of adenovirus and the bacteriophage Φ29; moreover, those TP proteins are not covalently linked to their polymerases.
Structure determination requires a source for sufficient amounts of pure, soluble protein which proved to be immensely difficult for P protein. Eventually, this problem was partly overcome by slight modifications in the primary sequence of DHBV P and particularly by adding solubility mediating fusion partners such as GrpE, NusA, or GST[53,102,103]. However, although such fusions (Figure 5B) display activity (see below) they appear to be present as “soluble aggregates” which are unsuited for crystallization.
Particularly TP, when expressed in E. coli, is completely insoluble on its own, mostly due to a hydrophobic region in the C terminal part. By selection from a pool of TP variants with random mutations we could isolate several TP variants, harboring fewer hydrophobic residues in this region, as monodispersely soluble proteins (J. Beck and M. Nassal, unpublished data). Interestingly, the same region was recently implicated to contain a molecular contact site, possibly for the RT domain, as suggested by the ability of separate TP and RT/RH domains to trans-complement each other[53,95,105]. Since replacing several conserved hydrophobic regions at a time may affect TP function the challenge will be to find, for crystallization, mutants that combine solubility with functional activity. However, being at the heart of hepadnaviral replication, solving the structure of TP is one of the big current challenges.
CELL-FREE RECONSTITUTION OF HEPADNAVIRAL REPLICATION INITIATION
Cell-free systems are inherently much more manipulatable than intact cells. The first such system to investigate the mechanism of hepadnaviral reverse transcription was based on the observation that DHBV P protein, in vitro translated in rabbit RL, became radioactively labeled when the translation reaction was supplied with α32P-dNTPs-as expected if the initial step of reverse transcription, i.e. covalent attachment of the first nucleotide to P protein, had occurred (“in vitro priming”; Figure 3B). In fact, the 3´copy of Dε present on the P protein mRNA was shown to be the template for limited elongation; the role of ε, though only the 5´ copy, as authentic genome replication origin was confirmed for DHBV[56,57], and finally also HBV.
The in vitro translation system has been used extensively to functionally analyze P protein as well as Dε mutants, particularly because Dε can be added as a separate short RNA covering just the hairpin structure ("trans-priming"). Human HBV P protein and ε, however, show no enzymatic activity in this setting.
THE P-ε COMPLEX: DETERMINANTS FOR BINDING, PRIMING AND ENCAPSIDATION
A first - though not the only-requirement for a productive P-ε interaction is specific binding. In many RNA-protein interactions, structural diversity and hence specificity is achieved by deviations from a fully base-paired double-helical structure, e.g. by interspersed single-stranded bulges and loops. Indeed, the Dε bulge structure (but not its actual sequence, unless it affects structure) is absolutely necessary for P binding. Mutants in the upper stem (see Figure 4B) which favor stable non-bulged structures do not bind to P. Similarly critical is the sequence and structure at the junction between the lower stem and the bulge: Mutation of the unpaired U2604 opposite the bulge to G substantially reduced binding (and nearly abolished priming). At the tip base-pair of the lower stem, base identity of G2605, but not base-pairing itself, is important; in addition, most ribose residues in Dε could be replaced by deoxyribose residues, except in the two top base pairs of the lower stem and in the bulge residue templating the first nt of (-)-DNA. Hence this small region contains an essential base- and backbone dependent determinant for P-interaction (Figure 4D), likely because it forms, together with the bulge, a distinct three-dimensional recognition surface for P protein.
In the apical loop, various mutations had no drastic effects on P binding, suggesting the loop is not principally required for complex formation (though probably involved in forming a priming competent structure; see below). An exception is U2590 replacement of which by C abrogated binding[70,108]. However, deletion of this residue did not affect binding, and the negative effect of the U2590C mutation was partially rescued by an additional G to A mutation at the neighboring position 2589. A direct 3D structure comparison between wild-type Dε, currently underway (S.S. Wijmenga, personal comm.), and the U2590C mutant RNA may help to explain this complex phenotype.
The existence of Dε RNA variants which bind P but have a much reduced or no template activity indicated that a productive P-ε interaction requires more than binding[61,70,71]. 2D structure comparisons of free versus P bound RNAs provide compelling evidence that the ability of an RNA to undergo a specific conformational shift is such a decisive additional feature. Several base-paired nt in the upper stem of Dε become highly accessible to nucleases once bound to P; similar changes are not seen with priming-deficient variants. Hence it is likely that the RNA, after an initial binding step, must experience an induced-fit alteration into a new structure, and that only this is usable as a template. Non-productive complexes, by contrast, appear to be trapped at the initial binding stage. Foot-printing analysis further revealed that in productive, but not non-productive, complexes the 3' half of the loop plus 3´ adjacent nt are protein bound, as are the unpaired U opposite the bulge and the nt at the tip of the lower stem (Figure 4D). Hence although the loop nucleotides may not be strictly required for initial binding, they probably provide a protein binding site that becomes crucial in the transition to a priming-active complex.
Also P protein undergoes structural alterations in this process. Proteolysis of in vitro translated DHBV P protein yielded a distinct proteolytic fragment only in the presence of priming-competent but not priming-inactive RNA variants[108,110]. Hence RNA and P protein mutually alter each others conformation, likely to properly arrange the ε template region and the priming Y residue of TP in the active site of the RT domain.
Notably, of the various P binding RNAs only those that are priming-active also support pgRNA encapsidation. Hence the abilities to initiate reverse transcription, to package pgRNA, and to adopt a distinct RNA-P protein complex conformation appear strictly coupled. In effect, this represents a quality control mechanism ensuring that only RNAs suitable as templates for reverse transcription are packaged. The current methods to analyze these replication-relevant structural changes have a very limited resolution; ultimately, biophysical examination of a priming-active complex will be required, and its comparison with non-productive complexes should be most revealing. However, several significant obstacles will have to be overcome for this approach, not the least being the strict dependence of DHBV P protein activity on additional cellular factors (see below).
This holds even more for active initiation complexes of human HBV. Insect cell expressed HBV P exerted a low but clearly detectable polymerase activity, with part of the DNA products covalently linked to TP as expected from authentic initiation. The system revealed some important features, such as the priming role of Y63 in HBV TP, trans-complementation between TP and RT/RH domains, and even some nucleocapsid formation by co-expression of core protein. Disturbingly, however, various of these activities were not strictly ε dependent, and they could not be reconstituted after (partial) purification of the P protein. Whether insect cells contain some RNA that can substitute for ε has not been clarified. In vitro translated HBV P protein shows no priming activity in reticulocyte lysate and not even in lysates from Huh7 cells (J. Beck and M. Nassal, unpublished data) which support replication when transfected with HBV. Recently, however, Hu et al reported a partial but important progress in that they were able to demonstrate specific binding-though not priming-for HBV P and ε in an in vitro reconstitution system (see next paragraph).
IN VITRO RECONSTITUTION OF REPLI-CATION INITIATION FROM PURIFIED COMPONENTS
One aspect for in vitro translation of P protein was, at the time, the lack of alternative ways to recombinantly produce the protein. Surprisingly, wheat germ extract versus rabbit RL translated DHBV P protein had a much reduced priming activity, suggesting the mammalian RL provided additional essential factors. These turned out to be cellular chaperones, which are abundantly present in RL. Apart from Hsp60 chaperonins (GroEL in bacteria), Hsp70 (DnaK in bacteria) and Hsp90 constitute the major chaperone systems. Hsp70 assists folding of many newly synthesized polypeptides, refolding of misfolded proteins resulting, e.g., from heat-shock, and protein translocation through membranes[115-117]. Hsp90 has also broad but more specialized folding functions, usually if not exclusively in concert with Hsp70; the two systems are linked via the Hsp70/Hsp90 organizing protein Hop. By analogy to the activation of nuclear hormone receptors it was proposed that Hsp90 plus its small co-chaperone p23 are the essential factors for P protein activation[113,120]. However, the complex overall composition and the high chaperone content of RL compared to the minute amounts of P protein precluded a clear-cut distinction.
This was overcome when DHBV P protein became accessible in larger quantities by expressing, in E. coli, slightly modified variants[53,102], in particular as fusions with solubility enhancing heterologous domains (Figure 5B). Such recombinant P proteins could now be added to the RL system and showed activity. Finally, because the translation function of RL was not required any longer, it became possible to systematically analyze the chaperone requirements for P protein activation with purified individual components.
In this way we could demonstrate that DHBV P can efficiently be activated in vitro by Hsp40 and Hsp70 plus ATP as an energy source, without the need for Hsp90 or other cofactors. The primary reaction product is an RNA binding-competent form of P protein (P*) that decays quickly in the absence of ε RNA but, in its presence, accumulates in an initiation-competent form (Figure 6).
Figure 6 Model for Hsp40/Hsp70 mediated in vitro activation of P protein.
The low ATPase activity of Hsp70 is stimulated by Hsp40, yielding high substrate affinity Hsp70/ADP. A new cycle of substrate release and folding requires exchange of the ADP for new ATP which is stimulated by nucleotide exchange factors such as BAG-1. This ATP-dependent Hsp70 cycling applies to the chaperone´s global folding activities but likely also to P activation: in the inactive state of P (1), the ε RNA binding pocket is inaccessible; Hsp40/Hsp70 activation creates active P* (2) which is able to bind ε RNA (3); P* is metastable, and decays to the inactive state (1) within minutes. Maintaining a steady-state level of P* requires constant re-activation by Hsp40/Hsp70 and thus a continuous supply of fresh ATP. Complexes containing priming-competent ε RNAs (3) undergo induced-fit rearrangements in the RNA and the protein, enabling them to initiate DNA synthesis upon dNTP addition (4). Several Dε variants bind P but do not act as template; most likely they are trapped at stage (3). The same may hold for human HBV P protein complexes with HBV ε RNA.
Maintaining P in its activated P* form requires a constant supply of ATP, and the same holds for the general folding activity of Hsp70. This suggested that P activation represents a special form of Hsp70 mediated folding in which the chaperone, rather than helping P from an unfolded into a stable folded state, affects the equilibrium between the inactive P ground state and the metastable activated P* form. Of note, bacterial DnaK has been shown to interact, physiologically, with a few folded, as opposed to misfolded, proteins such as the transcription factor σ32[121,122].
Hsp70 chaperoning is a cyclic ATP-driven process. Hsp70 binds ATP and in this form exerts low affinity to folding substrates (Figure 6, left). ATP hydrolysis then generates the high affinity Hsp70/ADP form. In the presence of substrate, the weak Hsp70 ATPase activity is stimulated up to 1000 fold by Hsp40 and related J-domain proteins (named after the prototypic E. coli Hsp40, DnaJ;), explaining the important role of Hsp40. Initiating a new folding cycle requires reconversion of Hsp70/ADP into Hsp70/ATP, i.e. replacement of the bound ADP by fresh ATP; otherwise Hsp70 would be trapped and not be available for acting on new substrate molecules. Spontaneous nt exchange is slow but strongly enhanced by nucleotide exchange factors (NEFs). This predicted that addition of BAG-1, an established NEF of Hsp70, to the minimal in vitro reconstitution system should enhance the formation of P* molecules. Indeed, we observed a strong BAG-1 dependent increase in priming-active P molecules, but not with a BAG-1 mutant unable to interact with Hsp70. Hence ADP-ATP exchange on Hsp70 is the rate limiting step in the in vitro priming reaction (M. Stahl, M. Retzlaff, M. Nassal and J. Beck, submitted). A working model for the Hsp40/Hsp70 activation of DHBV P protein is shown in Figure 6.
Though these data are clear-cut, Hu and colleagues, using a similar experimental set-up though a different (i.e. GST) DHBV P fusion protein, reported that in their in vitro system P activation was strictly dependent on the additional presence of Hsp90 and the Hsp70/Hsp90 adaptor protein Hop, with the Hsp90 co-chaperone p23 enhancing the reaction rate.
In our system, Hop and Hsp90 do have a stimulatory-but not an essential-effect (in a similar range as BAG-1), particularly at low P protein concentrations. In addition, the specific nature of the Hsp40 used can affect activation efficiency. Eukaryotic cells contain numerous Hsp40-like (or J-domain) proteins; all contain the about 70 aa long J-domain which mediates interaction with Hsp70. Hsp70 activation of P protein appears to proceed selectively with the Hdj1 variant of Hsp40 but not Hdj2 or its yeast homolog Ydj1. With Ydj1, Hop and Hsp90 may become essential for detectable activation. Why different Hsp40s have different effects is currently unclear because all Hsp40s can stimulate the Hsp70 ATPase activity; however, there are different ways of how a substrate can enter the folding complex: either it is bound directly by Hsp70, or it is presented to Hsp70 by the Hsp40-like protein. Therefore, the various additional domains present in different J-domain proteins could effect the formation of P* via different pathways. Hence in summary, P activation in vitro is fundamentally dependent on Hsp70/Hsp40 but can be enhanced by additional factors, including Hop/Hsp90.
Interestingly, the strict chaperone dependence of DHBV P protein activity was relieved by an extensive C terminal truncation that removed the entire RH domain and some 75 aa from the RT domain; the only requirement for priming activity (though restricted to the very first DNA primer nt) of this truncated Mini-RT 2 protein (Figure 5C) was the presence of mild detergent. This suggests that in full-length P protein C terminal parts somehow block the ε RNA binding site, and that this occlusion is removed by the chaperone action[102,127].
Where exactly the chaperones bind to P and which conformational rearrangements they induce is unclear. Using in vitro translated DHBV P protein in RL Tavis et al noted the ε-dependent generation of a papain- and trypsin-resistant fragment covering mainly the RT domain. Taking advantage of the simple, defined composition of the in vitro reconstitution system we could directly investigate the effects of Hsp40 and Hsp70 on P protein conformation. Limited V8 protease digestion revealed specific chaperone- and ATP-dependent cleavages in the C terminal part of TP (between aa 164 and 199). Hence this TP region is inaccessible in non-activated P protein but becomes exposed in P* (M. Stahl, J. Beck and M. Nassal, unpublished data). The functional relevance of this conformational alteration is supported by its correlation with the ability of P to bind ε RNA, and the shielding of the same region from protease attack as long as the Dε RNA is bound; furthermore, the larger fragment encompasses two residues, K182 and R183, which are essential for RNA binding. These data support the model shown in Figure 6. Inclusion of Hop and Hsp90 in this assay should now allow one to monitor whether these, or additional factors, have differential effects on P conformation, or whether they mainly stabilize the changes already established by Hsp40/Hsp70.
Extending the scope of the in vitro reconstitution system, Hu and colleagues recently demonstrated that specific binding to ε by the human HBV P protein also appears to be controlled by cellular chaperones[72,128]. On the ε RNA side, the region immediately surrounding the bulge was essential but the apical loop was not. This was surprising given that the loop is essential for pgRNA encapsidation and initiation of replication in intact cells. However, in light of the above described DHBV data, it may just be a drastic manifestation that mere physical binding is not sufficient for a productive interaction; this is further supported in that only about half the P protein sequence was required for RNA binding, with even the catalytic YMDD motif being dispensable. Apparently, the crucial second step, which in Dε involves rearranging the upper stem and the loop, does not occur in the reconstituted HBV P protein-ε complexes, possibly due to the high stability of this region in HBV ε. Finding conditions under which HBV P protein exerts authentic ε-dependent in vitro priming-activity remains therefore another major challenge in HBV biology.
IMPORTANCE OF CHAPERONES FOR HEPADNAVIRUS REPLICATION IN INTACT CELLS
Both the Hsp70 and the Hsp90 chaperoning activities are subject to regulation by a multitude of co-chaperones[117,129]. Hence which of the in vitro reconstitutable processes is the relevant one inside cells is not trivial to address, in particular because the chaperones are crucial for very many fundamental and regulatory cellular processes. For instance, geldanamycin (GA), an inhibitor of Hsp90, strongly interfered with DHBV replication in transfected cells at a concentration of 10 ng/mL whereas significant reduction of the in vitro priming activity required much higher concentrations. Therefore, the antiviral effect in cells might be indirect, perhaps via one of the cellular kinases that are regulated by Hsp90. Such effects on the cell make it also difficult to imagine that chaperone inhibitors could therapeutically be used against HBV infection without causing severe adverse effects. That GA analogs, nonetheless, exert tumor-specific therapeutic value is probably due to the selective presence, in cancer cells, of Hsp90 in a high GA affinity state. Whether this also holds for HBV infected cells is not known.
Various additional chaperones have, in part indirectly, been implicated in affecting P protein, for instance Hsp60[133-135] and the Hsp90 family member GRP94; however, GRP94 is an endoplasmic reticulum (ER) resident protein for which a role in P protein activation is difficult to imagine. Overexpression of p50/cdc37, a co-chaperone of Hsp90 involved in activation of several signal transduction kinases, stimulated DHBV replication in transfected cells and a dominant negative mutant of p50/cdc37 inhibited DHBV P protein priming in vitro. Again, the physiological relevance of these observations remains to be confirmed.
One possible solution would be to map the contact sites of the various chaperones on P protein using biochemical and biophysical methods, and then to generate P mutants with specific interaction defects. Monitoring their replication phenotypes in transfected, or better in infected, cells should then allow to narrow down which of the various reported interactions are truly significant for virus propagation in vivo.
Though we have stressed the uniquely strict chaperone-dependence of hepadnaviral P protein activation, there is accumulating evidence that other polymerases are also affected by chaperones. Probably the closest analogy exists to telomerase, the cellular reverse transcriptase that maintains chromosome end integrity[138,139]. However, also the DNA polymerase of Herpes simplex virus, and the RNA polymerases of flock house virus and influenza virus appear to require chaperone assistance[142,143]. Closely watching progress in those areas might also provide clues as to the mechanism of chaperone-assisted hepadnavirus replication; however, for polymerases without a similarly sophisticated protein-primed initiation mode the chaperones could act at rather different levels.
DNA PRIMER TRANSLOCATION AND (-)-DNA COMPLETION
In contrast to initiation, the subsequent steps for RC-DNA formation are currently not amenable to in vitro analysis, and they appear intimately related to the proper environment of intact nucleocapsids. Hence reverse genetics is still the most rewarding approach to address these equally puzzling and complex events.
The initial model of hepadnaviral (-)-DNA formation assumed that synthesis would start inside the 3´ copy of DR1 (DR1*), for HBV at the motif 5´ UUCA. As discussed above, the complementary sequence 3´ AAGT at the 5´ end of (-)-DNA is instead copied from the UUCA motif in the ε bulge (Figure 7A). Hence the oligonucleotide bound to TP must specifically be translocated to the
Figure 7 DNA primer translocation (first template switch).
P copies 3 to 4 nt from the 5´ ε bulge, yielding the TP-linked DNA oligonucleotide which is translocated to the complementary motif in the 3´ proximal DR1*. A: Linear representation. DR1* is nearly 3 kb apart from 5´ ε in primary sequence, and numerous other UUCA motifs are not used as acceptors. Φ denotes a newly identified cis-element with partial sequence complementarity to the 5´ half of ε. 3´ ε (light grey) is dispensable. The HBV specific sequences in the ε bulge, and in DR1* are shown below in capitals, flanking sequences in lower case; B: Models for juxtaposition of 5´ ε and DR1*. A general mechanism would be closed loop formation of the pgRNA by cap-binding and poly-A binding factors (ovals), e.g. via elongation initiation factor 4G (eIF-4G; large oval). More specifically, ε might base-pair with Φ[146,147], as indicated by the grey arrows. In such an arrangement, a small movement, rather than a big jump, of TP with the bound DNA primer (dashed outline) would suffice for specific translocation to DR1*.
3´ DR1*, nearly 3 kb apart from 5´ ε. Given that there are about 20 further UUCA motifs on the pgRNA, and that even fewer than 4 nt of identity between the template region in ε and the target site in DR1* are sufficient for specific transfer, additional elements ensuring proper translocation must be operating. One model is that DR1* and 5´ ε are brought into close proximity (Figure 7B). A general mechanism would be closed-loop formation of the pgRNA via cellular proteins such as elongation initiation factor 4G (eIF-4G) which links 5´ cap and 3´ poly-A binding proteins. A recent more specific model is a long-range RNA interaction between ε and a new cis-element ("Φ" or "β5") slightly upstream of DR1* that is involved in proper (-)-DNA synthesis[144-146]; it contains a sequence that is partially complementary to the 5´ half of ε. Mutations affecting the base-pairing potential between the two sequences can, indeed, influence the efficiency of (-)-DNA synthesis. Exactly how an ε-Φ interaction would aid in primer translocation is unclear because base-pairing would simultaneously affect the proper ε structure, implying some temporal regulation. Similar Φ elements have been proposed to be present in the other hepadnaviruses; for DHBV we noted, however, that Dε mutants with reduced potential base-pairing to the supposed Φ element showed no obvious (-)-DNA synthesis defects ( and K. Dallmeier, B. Schmid and M. Nassal, unpublished data).
The primer translocation process must also involve remodeling of the P-ε complex; the priming Y residue of TP must give way to the growing DNA oligonucleotide, then ε must be replaced by DR1* as the template. Hence P, like other protein-priming polymerases, must have distinct initiation and elongation modes. The recently solved structure of the bacteriophage Φ29 DNA-dependent DNA polymerase with its (separately expressed) terminal protein gives an impression of the dynamic changes that have to occur. Φ29 uses for protein priming an S residue in one of three distinct TP domains. In the initiation mode, the priming domain mimics the primer-template nucleic acid that occupies the same site during elongation. The initiation reaction continues until 6 to 10 nt have been attached; at this time the priming domain is pushed out of the polymerase´s active site, marking the transition to elongation mode. Similar events must happen in P protein. Notably, initiation of Φ29 replication does not start at the very terminal template nt; rather the (identical) penultimate nt is copied. Next, the copied nt slips back to the terminal template position, and the penultimate nt is copied again; adenovirus uses a similar mechanism. Interestingly, such slip-back and re-copying can also occur with HBV P protein on mutant ε templates. A full understanding, however, will obligatorily require high resolution structural data on the hepadnaviral replication complex.
Notably, the single-stranded (-)-DNA is an intermediate that, as in retroviral replication, could be a target for cytidine deamination, and consequently inhibition of replication, by APOBEC enzymes. APOBEC3G, one of several family members, has indeed been reported to interfere with HBV replication, however, the mechanism does not involve editing. Also, expression of the corresponding APOBEC mRNA is low in hepatic cells, although it might be inducible. Further research is needed to clarify whether any of the APOBEC enzymes is genuinely involved in the innate response against HBV infection. At any rate, sequestration into intact nucleocapsids of this potentially vulnerable single-stranded DNA replication intermediate may be an efficient counter-defense of HBV.
(+)-STRAND DNA SYNTHESIS AND CIRCULARIZATION
The end product of (-)-DNA synthesis is a unit length DNA copy of the pgRNA from its 5´ end to, in HBV, the UUCA motif in the 3´ DR1* (Figure 8A); hence it contains a small, about 10 nt, terminal redundancy (“r”). Most of the pgRNA template is degraded concomitantly to (-)-DNA synthesis by the RNase H domain of P. The fate of the non-copied 3´ end of the pgRNA from DR1* to the polyA tail is not exactly known. It seems to be underrepresented in the packaged RNA, both in DHBV, and in HBV with mutations in the nucleic acid binding domain of the core protein. Whether some of these 3´ ends are never completely encapsidated, or whether they dissociate out of the capsid, has not yet conclusively been shown.
Figure 8 RC-DNA formation.
A: (-)-DNA completion. The DNA primer, still linked to TP, is extended from DR1* to the 5´ end of pgRNA. The RNA is simultaneously degraded by the RH domain, except for its capped 5´ terminal region including 5´ DR1; the fate of the poly-adenylated 3´ end is unclear; B: RNA primer translocation (second template switch). The RNA primer translocates to DR2, and is extended to the 5´ end of (-)-DNA. 3´ r and 5´ r denote an about 10 nt redundancy on the (-)-DNA. As above, several cis-elements appear to promote close proximity of the DR1 donor and the DR2 acceptor, as schematically indicated in the right hand figure; C: Circularization (third template switch). Having copied 5´ r, the growing 3´ end of the (+)-DNA switches to 3´ r on the (-)-DNA, enabling further elongation. This reaction must involve juxtaposition of 5´ r and 3´ r. For easier comprehension, the switch is also depicted on the basis of the representation shown on the right of Figure 8B; both are topologically equivalent; D: RC-DNA. Extension on the (-)-DNA template creates a set of (+)-DNA strands of various length; E: Double-stranded linear (dsL) DNA. This minor DNA form originates when the RNA primer, having failed to translocate to DR2, is extended from its original position ("in situ priming").
Well established is, however, that the 5´ terminal about 15 to 18 nt of the pgRNA including the 5´ DR1 sequence are spared from degradation, probably because the active site of RH is halted at this distance when the RT domain reaches the template 5´ end. This capped 5´RNA oligo is essential as primer for (+)-DNA synthesis. Extension of the RNA from its original position (“in situ priming”) gives rise to a double-stranded linear (dsL) DNA (Figure 8E) which occurs to a small percentage in all hepadnaviruses. Lacking the core promoter/enhancer for pgRNA translation upstream of the pgRNA start site it is unsuited for virus propagation but may be of pathogenic potential. For RC-DNA formation, the RNA primer must be transferred to the 3´ proximal DR2 (Figure 8B); pgRNAs with improper 5´ ends, not containing the DR1 sequence in the RNA primer, fail to undergo this essential second template switch. Why the RNA primer predominantly jumps to DR2 although its complementarity to the initial site is larger (more than 15 versus 11 or 12 nt) is not obvious. An interesting explanation has been proposed for DHBV, according to which burying part of the 5´ DR1 sequence in a competing intramolecular hairpin structure effectively shortens the sequence available for hybridization with the RNA primer.
From its new location on DR2 the RNA primer is extended towards the P bound 5´ end of the (-)-DNA, including the 5´ r redundancy. Further elongation requires a third template switch, i.e. circularization (Figure 8C). In effect, the growing (+)-DNA end is transferred from 5´ r to 3´ r on the (-)-DNA template from where it can further be extended to yield RC-DNA (Figure 8D).
Though sequence identity between 5´ r and 3´ r is important, additional cis-elements are again required to ensure efficient RNA primer translocation and circularization. Using quantitative genetic techniques, the Loeb laboratory has defined, mostly in DHBV, several such cis-elements on the (-)-DNA[144,156-161], e.g. 3E, M, and 5E, which are located at both termini and in the middle of pgRNA. Collectively, these data provide evidence that these cis-elements, via long range base-pairing, allow for a close juxtaposition of the corresponding donor and acceptor sites, and thus facilitate the proper template switches (Figure 8B, right panel). How this is achieved inside the replicating nucleocapsid is not easily envisaged. Also, we noted that chimeric heron-duck HBVs are less sensitive than DHBV to reduced base-pairing in some of these elements (K. Dallmeier and M. Nassal, unpublished). There is evidence, however, that similar potentially base-pairing cis-elements are present in human HBV[144,162] and assist in circularization (E. Lewellyn and D.D. Loeb, pers. comm.).
Hence intramolecular base-pairing is probably an important mechanism that ensures proper shaping of the viral genome for the various template switches that eventually yield RC-DNA. More details than is possible to show here can be found in informative schematic representations in references[144,160].
CONCLUSIONS AND PERSPECTIVES
Since the original discovery that hepadnaviruses replicate through reverse transcription numerous novel and unique aspects of the replication process have been unraveled by using reverse genetics in transfected cells. Indisputably, this system will continue to be an important tool in future HBV research; in some cases because even a cancer-derived stable cell line mimics many aspects of the complex natural environment for viral replication, allowing to look into the interplay between virus replication and cellular networks by genomics and/or proteomics; in others because we have currently no alternative due to the complexity of the interactions between the viral components themselves. An example is the control of the various template switches during RC-DNA generation by long-range nucleic acid interactions which apparently can only occur, and be monitored, in the context of assembled nucleocapsids.
For a mechanistic understanding, however, cell-free systems are indispensable. Two striking examples for their power are the discovery of the replication origin function of ε and the chaperone-dependence of P protein activation; both would have gone unnoticed for a long time without the in vitro translation system. To disentangle these multifactorial processes on the molecular level, in vitro reconstitution from purified components is the approach of choice. Despite the bewildering complexity of the chaperoning systems, it will now be possible, starting from relatively simply composed systems, to systematically add-in additional factors and monitor their contributions to P protein activation; this should, inter alia, help answer the pressing question why human HBV P protein shows no enzymatic activity under conditions where its DHBV counterpart is active. An in vitro activity assay for HBV P protein would also be an important screening tool for better HBV antivirals. Similarly rewarding should be the development of in vitro systems that address other steps of the viral replication cycle, foremost perhaps inclusion of the core protein into the now available replication initiation systems.
Ultimately, structural biophysics needs to enter the field. At this time the HBV core protein, without its nucleic acid binding domain, is the only hepadnavirus component for which a high resolution structure is available. Fortunately, 3D structural analyses of the ε element are well underway, and at least individual domains of the P protein may become amenable to direct structure investigations. The most exciting, though also most challenging aim given its multifactorial composition, will be to obtain high resolution data for the P protein-ε RNA complex, caught in the act of DNA synthesis. However, the relevance of any such biophysical and biochemical data will have to be corroborated in the context of the complete virus replication cycle, whenever possible in an in vivo setting.
Finally, although the complex interactions between viral, and viral and cellular, components may seem of foremost interest for basic HBV biology, they also hold the keys for novel, and more efficacious therapies of chronic hepatitis B. The few currently approved anti-HBV drugs are all nucleos(t)idic inhibitors of the reverse transcriptase. Their long-term efficacy is limited by the virtually unavoidable emergence of resistant HBV variants; any new compounds with the same mechanism of action will also face this problem, which will be aggravated by cross-resistance, particularly against different analogs of the same natural nucleotide. Each of the steps in the HBV replication cycle that is now being elucidated in molecular detail provides new, unconventional targets for interference. Replication initiation alone depends on many specific interactions, including the TP and RT/RH domains of P protein, ε RNA, and different chaperones. Blocking any of these interactions, e.g. by small molecules binding to, and altering the structure of ε or preventing chaperone binding to P, would abolish reverse transcription by mechanisms entirely different from that of nucleoside analogs. Knowledge of the 3D structure of P would allow to design better nucleosidic as well as non-nucleoside inhibitors, and additional opportunities will certainly arise once the process of cccDNA formation is better understood. Especially in combination with conventional antivirals, such potential new drugs should greatly increase the chances for curing, rather than just controlling, chronic hepatitis B.