Yll M, Cortese MF, Guerrero-Murillo M, Orriols G, Gregori J, Casillas R, González C, Sopena S, Godoy C, Vila M, Tabernero D, Quer J, Rando A, Lopez-Martinez R, Esteban R, Riveiro-Barciela M, Buti M, Rodríguez-Frías F. Conservation and variability of hepatitis B core at different chronic hepatitis stages. World J Gastroenterol 2020; 26(20): 2584-2598 [PMID: 32523313 DOI: 10.3748/wjg.v26.i20.2584]
Corresponding Author of This Article
Maria Francesca Cortese, PhD, Research Scientist, Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d’Hebron, Universitat Autònoma de Barcelona, Passeig Vall d’Hebron 119-129, Barcelona 08035, Spain. email@example.com
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Conservation and variability of hepatitis B core at different chronic hepatitis stages
Marçal Yll, Maria Francesca Cortese, Mercedes Guerrero-Murillo, Gerard Orriols, Josep Gregori, Rosario Casillas, Carolina González, Sara Sopena, Cristina Godoy, Marta Vila, David Tabernero, Josep Quer, Ariadna Rando, Rosa Lopez-Martinez, Rafael Esteban, Mar Riveiro-Barciela, Maria Buti, Francisco Rodríguez-Frías
Marçal Yll, Maria Francesca Cortese, Gerard Orriols, Rosario Casillas, Carolina González, Sara Sopena, Cristina Godoy, Marta Vila, David Tabernero, Ariadna Rando, Rosa Lopez-Martinez, Francisco Rodríguez-Frías, Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona 08035, Spain
Marçal Yll, Maria Francesca Cortese, Mercedes Guerrero-Murillo, Josep Gregori, Rosario Casillas, Sara Sopena, Marta Vila, Josep Quer, Francisco Rodríguez-Frías, Liver Unit, Liver Disease Laboratory-Viral Hepatitis, Vall d'Hebron Institut Recerca-Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona 08035, Spain
Mercedes Guerrero-Murillo, Department of Microbiology, Hospital Universitari Vall d'Hebron, Universitat Autònoma de Barcelona, Barcelona 08035, Spain
Josep Gregori, Cristina Godoy, David Tabernero, Josep Quer, Rafael Esteban, Mar Riveiro-Barciela, Maria Buti, Francisco Rodríguez-Frías, Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas, Instituto de Salud Carlos III, Madrid 28029, Spain
Rafael Esteban, Mar Riveiro-Barciela, Maria Buti, Liver Unit, Department of Internal Medicine, Hospital Universitari Vall d'Hebron, Universitat Autónoma de Barcelona, Barcelona 08035, Spain
Author contributions: Rodríguez-Frías F designed the research; Cortese MF coordinated the research; Yll M and Cortese MF equally contributed to design the experiments; Yll M, Orriols G, Godoy C, Sopena S, Casillas R, González C, Vila M and Rando A performed the experiments; Yll M, Cortese MF, Gregori J and Guerrero-Murillo M analyzed data acquired during the experiments and interpreted the results; Yll M and Cortese MF drafted the manuscript; Cortese MF, Tabernero D, Lopez-Martinez R, Riveiro-Barciela M, Buti M, Quer J, Esteban R and Rodríguez-Frías F critically reviewed the manuscript.
Supported bythe Instituto de Salud Carlos III, Spain, the European Regional Development Fund, No. PI18/01436.
Institutional review board statement: The study was reviewed and approved by the Clinical Research Ethics Committee of Hospital Universitari Vall d’Hebron.
Conflict-of-interest statement: Josep Gregori is an employee of Roche Diagnostics, SL.
Data sharing statement: Next-generation sequencing data were submitted to the GenBank SRA database (BioProject accession number PRJNA625435).
ARRIVE guidelines statement: The authors have read the ARRIVE guidelines, and the manuscript was prepared and revised according to the ARRIVE guidelines.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Corresponding author: Maria Francesca Cortese, PhD, Research Scientist, Liver Pathology Unit, Departments of Biochemistry and Microbiology, Hospital Universitari Vall d’Hebron, Universitat Autònoma de Barcelona, Passeig Vall d’Hebron 119-129, Barcelona 08035, Spain. firstname.lastname@example.org
Received: February 28, 2020 Peer-review started: February 28, 2020 First decision: April 9, 2020 Revised: May 8, 2020 Accepted: May 19, 2020 Article in press: May 19, 2020 Published online: May 28, 2020
Since it is currently not possible to eradicate hepatitis B virus (HBV) infection with existing treatments, research continues to uncover new therapeutic strategies. HBV core protein, encoded by the HBV core gene (HBC), intervenes in both structural and functional processes, and is a key protein in the HBV life cycle. For this reason, both the protein and the gene could be valuable targets for new therapeutic and diagnostic strategies. Moreover, alterations in the protein sequence could serve as potential markers of disease progression.
To detect, by next-generation sequencing, HBC hyper-conserved regions that could potentially be prognostic factors and targets for new therapies.
Thirty-eight of 45 patients with chronic HBV initially selected were included and grouped according to liver disease stage [chronic hepatitis B infection without liver damage (CHB, n = 16), liver cirrhosis (LC, n = 5), and hepatocellular carcinoma (HCC, n = 17)]. HBV DNA was extracted from patients’ plasma. A region between nucleotide (nt) 1863 and 2483, which includes HBC, was amplified and analyzed by next-generation sequencing (Illumina MiSeq platform). Sequences were genotyped by distance-based discriminant analysis. General and intergroup nt and amino acid (aa) conservation was determined by sliding window analysis. The presence of nt insertion and deletions and/or aa substitutions in the different groups was determined by aligning the sequences with genotype-specific consensus sequences.
Three nt (nt 1900-1929, 2249-2284, 2364-2398) and 2 aa (aa 117-120, 159-167) hyper-conserved regions were shared by all the clinical groups. All groups showed a similar pattern of conservation, except for five nt regions (nt 1946-1992, 2060-2095, 2145-2175, 2230-2250, 2270-2293) and one aa region (aa 140-160), where CHB and LC, respectively, were less conserved (P < 0.05). Some group-specific conserved regions were also observed at both nt (2306-2334 in CHB and 1935-1976 and 2402-2435 in LC) and aa (between aa 98-103 in CHB and 28-30 and 51-54 in LC) levels. No differences in insertion and deletions frequencies were observed. An aa substitution (P79Q) was observed in the HCC group with a median (interquartile range) frequency of 15.82 (0-78.88) vs 0 (0-0) in the other groups (P < 0.05 vs CHB group).
The differentially conserved HBC and HBV core protein regions and the P79Q substitution could be involved in disease progression. The hyper-conserved regions detected could be targets for future therapeutic and diagnostic strategies.
Core tip: New tools for hepatitis B virus infection treatment and follow-up are needed. Hepatitis B virus core protein has a key role in viral replication and persistence. Analysis of viral quasispecies by next-generation sequencing can identify conserved regions in viral genes or proteins that may serve as targets for new therapeutic and diagnostic strategies. Moreover, it may help identify prognostic markers of liver disease progression. Here, we detected hyper-conserved nucleotide and amino acid regions regardless of the clinical stage. Moreover, we observed several group-specific conserved and variable regions and an amino acid substitution that could be indicative of different disease progression.
Citation: Yll M, Cortese MF, Guerrero-Murillo M, Orriols G, Gregori J, Casillas R, González C, Sopena S, Godoy C, Vila M, Tabernero D, Quer J, Rando A, Lopez-Martinez R, Esteban R, Riveiro-Barciela M, Buti M, Rodríguez-Frías F. Conservation and variability of hepatitis B core at different chronic hepatitis stages. World J Gastroenterol 2020; 26(20): 2584-2598
Hepatitis B virus (HBV) is a small virus with a specific tropism for the liver. It belongs to the Hepadnaviridae family. Despite the existence of effective preventive vaccines, an estimated 257 million people worldwide live with chronic HBV infection and more than 880000 people die every year of HBV-related complications such as liver cirrhosis (LC) and hepatocellular carcinoma (HCC).
HBV is an enveloped virus equipped with 3.2 kb of partially double-stranded circular DNA produced by the reverse transcription of an RNA intermediate known as pregenomic RNA. This ribonucleic intermediate is produced from a viral DNA molecule that interacts with cellular (histone and non-histone) and viral proteins, forming a “mini-chromosome” known as covalently closed circular DNA (cccDNA) that remains in hepatocyte nuclei for the rest of the cell’s life. Although current antiviral therapy can control viral replication, it is not capable of interfering with the formation or persistence of cccDNA, rendering HBV infection eradication impossible. This mini-chromosome could even be a source of HBV reactivation after clinical resolution and HBsAg seroclearance. Due to persistent infection, up to 1% of Caucasian patients with noncirrhotic chronic HBV infection have been found to develop HCC.
Gene therapy has emerged as one of the most promising strategies for blocking disease progression, and results from studies investigating the potential of small interfering RNA (siRNA) systems as adjuvant therapy are encouraging. SiRNA is a double-stranded noncoding RNA [with an optimal length of 21 nucleotides (nt)] that interacts with target messenger RNA, promoting its degradation and silencing of the gene.
HBV reverse transcriptase lacks 3' to 5' proofreading activity, which leads to viral genome variability comparable to that observed in an RNA virus. This genetic variability is further increased by inter- and intra-genotype recombination events. In short, HBV circulates as a complex mixture of closely related genetic variants (haplotypes) known as quasispecies.
The HBV core protein (HBc) [encoded by the HBV core gene (HBC) from the PreCore/Core open reading frame (ORF)] is essential for viral replication. It is a structural 21-kDa protein that self-assembles to create dimers that assemble in hexamers forming the icosahedral viral capsid[11,12]. It has 183 amino acids (aa) (185 for genotype A) with a N-terminal domain and a C-terminal domain (CTD) connected through a linker region. The N-terminal domain ranges from aa position 1 to 149 (including the linker region aa 140 to 149) and constitutes the α helix-rich assembly domain. The CTD is shorter (aa 150 to 183, or 185 for genotype A) and constitutes the functional domain. The CTD allows HBc to intervene in a multitude of processes such as subcellular traffic, viral genome release, capsid assembly and transport, RNA metabolism, and viral pregenomic RNA reverse transcription. Considering just how essential this protein is for viral replication, it could be an optimal target for gene therapy. Moreover, mutations in HBc may have different roles in liver disease progression, positioning them as potentially useful prognostic genetic markers.
Next-generation sequencing (NGS) is a highly sensitive technique for studying viral quasispecies; it is capable of detecting highly conserved regions of the HBV genome, regardless of genome or clinical stage. Moreover, it supports the identification and quantitative determination of specific variants that could be used as markers to predict prognosis and treatment response in patients with HBV infection.
The aim of this study was to apply NGS to analyse HBc conservation and variability at the nt and aa levels in patients with different stages of chronic HBV infection in order to identify hyper-conserved regions of the HBC gene that could be a target for gene therapy and to determine possible prognostic factors of disease progression
MATERIALS AND METHODS
Patients and samples
The study was reviewed and approved by the Clinical Research Ethics Committee of Hospital Universitari Vall d’Hebron (PR(AG)146/2020). No animals were used.
Forty-five patients with chronic HBV infection were recruited from members of the general population seen at the outpatient clinic at Vall d’Hebron University Hospital in Barcelona, Spain. They tested negative for hepatitis D virus, hepatitis C virus, and human immunodeficiency virus, and had a viral load > 3 log IU/mL, which is the limit of polymerase chain reaction (PCR) amplification sensitivity. HBV serological markers such as the surface antigen (HBsAg), the e antigen (HBeAg), and anti-HBe antibodies were tested using commercial chemiluminescent assays on a COBAS 8000 analyzer (Roche Diagnostics, Rotkreuz, Switzerland). HBV DNA was quantified by real-time PCR with a detection limit of 10 IU/mL (COBAS 6800, Roche Diagnostics). Patients were divided into 3 clinical groups according to liver disease stage determined by biopsy or diagnostic imaging in line with the EASL guidelines: Chronic HBV infection without liver damage (CHB group), chronic HBV infection with liver cirrhosis (LC group), and chronic HBV infection with hepatocellular carcinoma (HCC group).
HBC gene amplification and NGS
HBV DNA was extracted from 200 µL of serum using the QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. The region of interest was amplified through a 3-step nested PCR protocol (Figure 1). The first step (PCR1) covered a large region between nt 1774-2930 that includes the HBC gene (nt 1901-2464 for genotype A and 1901-2458 for other genotypes). As the Illumina MiSeq platform (Illumina, San Diego, CA, United States) allows read lengths of up to 600 bp, the following amplification steps were performed by dividing HBC into 2 amplicons (amplicon 1 = nt 1863-2317 and amplicon 2 = nt 2205-2483), which overlapped in a 112 nt-long portion (PCR2). The M13-tail, added in step 2, was used for the last step (PCR3), which introduced a 10 nt-long sample-specific multiplex identifier. All the PCR steps were performed using high-fidelity Pfu Ultra II DNA polymerase (Stratagene, Agilent Technologies, Santa Clara, CA, United States). The primers and protocols are reported in Table 1.
Table 1 Primer design and polymerase chain reaction protocols for each amplified region.
Bold nucleotides indicate the M13 sequence. Forward primers in PCR2-A2 were multiplexed at the same concentration to cover all HBV genotypes. The protocols of amplification are reported. A.1: Amplicon 1; A.2: Amplicon 2; PCR: Polymerase chain reaction; MID: Multiplex identifier.
Figure 1 Schematic summary of the 3 amplification steps.
In the first amplification step (PCR1), a large region was amplified. In the following step (PCR2), the region was divided into 2 amplicons that overlapped in a 112 nucleotide-long portion. In the third step (PCR 3) a sample identifier (MID) was added. PCR: Polymerase chain reaction; MID: Multiplex identifier.
The final PCR products were purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Beverly, LA, United States) and their quality verified using the Agilent 2200 TapeStation System and D1000 ScreenTape kit (Agilent Technologies, Waldbronn, Germany).
Purified amplicons were quantified using the Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific-Life Technologies, Austin, TX, United States) and pooled to guarantee that the 2 amplicons for each patient were adequately represented in the analysis (2.5x for amplicon 1 and 1x for amplicon 2, due to their different lengths). The amplicon pool was sequenced by NGS on the Illumina MiSeq platform.
The reads obtained underwent an in-house bioinformatics filtering procedure based on R scripts, as previously described by our group. For each amplicon, a group of unique sequences (haplotypes) forming the viral quasispecies was obtained. All sequences that did not match in the overlapping 112-nt region between amplicon 1 and 2 were discarded.
The bioinformatics methods used in this study were reviewed by Mercedes Guerrero-Murillo from the Microbiology Department at Vall d’Hebron Hospital (Barcelona, Spain) and by Dr. Josep Gregori from the Liver Disease Viral Hepatitis Laboratory at Vall d’Hebron Hospital (Barcelona, Spain), CIBERehd research group, and Roche Diagnostics SL.
Genotyping of the haplotypes
The amplicons from each patient were aligned with the same region of the respective amplicons extracted from 106 full-length HBV genome sequences representative of genotypes A to J obtained from the NCBI GenBank (Supplementary Table 1). Genotyping was conducted by applying distance-based discriminant analysis (DB rule)[19,20], which considers the inter- and intra-class variability of all genotypes. Genetic distances were computed according to the Kimura-80 model.
Conservation and mutation analysis
Sequence conservation at nt and aa levels was determined by calculating the information content (IC) of each position in a multiple alignment of all haplotypes detected with a frequency > 0.25.
This analysis calculates the mean IC for windows of 25 nt (or 10 aa), starting from the first position in the multiple alignment and moving forward in steps of 1. The hyper-conserved regions were detected by aligning all haplotypes, regardless of clinical stage. Differences in sequence conservation between the groups were determined by comparing IC values.
To identify specific nt insertions and deletions (indels) and aa substitutions that could discriminate between the groups, haplotype sequences were aligned with their genotype-specific consensus sequence. Consensus was obtained by aligning the sequences of the subgenotypes of interest extracted from the 106 full-length HBV genome sequences. Polymorphisms were identified by aligning haplotype sequences with a population consensus sequence and discarded.
Sequence conservation differences between the groups in the sliding windows were analysed using the Wilcoxon–Mann–Whitney test. Frequencies of aa changes detected were compared with the Kruskal-Wallis test and described as median and interquartile range (IQR). All analyses were performed in R version 3.2.3. P < 0.05 was considered significant.
Patients characteristics and NGS results
Of the 45 patients with chronic hepatitis initially included in the study, 38 passed the sequencing quality filters and had correctly overlapping amplicons 1 and 2. After application of the quality filters, a median (IQR) of 133156.5 (85961.25-605212) and 66571 (25958.5-2301225) sequences per patient were obtained respectively for amplicon 1 and amplicon 2. NGS data were submitted to the GenBank SRA database (BioProject accession number PRJNA625435; BioSample accession numbers are reported in Supplementary Table 2). In the clinical groups, there were 16 patients with CHB, 5 with LC, and 17 with HCC. The clinical and viral characteristics (including genotypes) are reported in Table 2.
Table 2 Main clinical and viral characteristics of hepatitis B virus-infected patients enrolled in the study.
CHB (n = 16)
HCC (n = 17)
LC (n = 5)
Viral load (log IU/mL)
Genotype, % (n)
D/E and D/A indicate mixtures of the 2 genotypes. The frequency of each genotype within the clinical groups is reported as percentage (%) and number of patients (n). CHB: Chronic hepatitis B infection without liver damage; HCC: Hepatocellular carcinoma; LC: Liver cirrhosis; ALT: Alanine aminotransferase (normal value < 40 IU/mL); AST: Aspartate aminotransferase (normal value < 40 IU/mL); IQR: Interquartile range; NS: No-statistical P value.
Sequence conservation at the nt level
Sequence conservation was studied by applying a sliding window analysis to the entire HBC sequence overlapping the 2 amplicons at the common 112 nt-long portion. No differences in IC were observed on analyzing the sequences by haplotype considering or not their relative frequency (Figure 2A). Considering the IC of all the nt-sequence haplotypes obtained (regardless of clinical group), we identified 3 hyper-conserved regions (nt 1900-1929, 2249-2284, and 2364-2398, Figure 2B). Most of the nt positions within these regions yielded the maximum IC value of 2 bits (100% conservation).
Figure 2 Information content analysis at nucleotide level.
A: Sliding window analysis of Hepatitis B core gene performed by aligning the quasispecies haplotypes for all 38 patients with and without considering their relative frequency. Each point on the graph represents the mean information content (in bits) of the 25-nucleotides windows, with forward displacement of 1 nucleotide step between windows. The purple line shows the analysis by haplotype (By hpl), which is the mean information content obtained from the multiple alignments of all quasispecies haplotypes. The orange line represents the analysis by haplotype frequency (By hpl freq), which is the mean information content from the multiple alignments of all the patients’ quasispecies haplotypes considering their relative frequency. The dashed lines indicate the 3 common hyper-conserved regions observed, with reporting of their positions. B: Representation of detected hyper-conserved regions as sequence logos (with reporting of nucleotide positions). The relative sizes of the letters in each stack indicate their relative frequencies at each position within the multiple alignments of nucleotide haplotypes. The total height of each stack of letters depicts the information content of each nucleotide position, measured in bits (Y-axis): from minimum (0) to maximum conservation (2). By hpl: Analysis by haplotype; By hpl freq: Analysis by haplotype frequency; nt: Nucleotide.
On comparing the IC of each clinical group by haplotype, the HCC and LC groups showed similar conservation patterns; CHB was notably associated with the lowest level of conservation, mainly evident in 5 regions: nt 1946-1992, 2060-2095, 2145-2175, 2230-2250, and 2270-2293 (P < 0.05, Figure 3A). Three group-specific conserved regions were detected: 1 in the CHB group (nt 2306-2334) and 2 in the LC group (nt 1935-1976 and 2402-2435; Figure 3B). Most of the nt positions within these regions yielded the maximum IC value of 2 bits (100% conservation).
Figure 3 Information content analysis at nucleotide level by clinical stage group.
A: By-haplotype sliding window analysis of the Hepatitis B core gene according to different clinical groups (HCC in blue, CHB in red, and LC in green). The portions and positions where CHB showed lower levels of conservation than the others (P < 0.05) are shown in red. B: Representation of the information content of CHB- and LC-specific conserved nucleotide regions as sequence logos. Positions are reported at the top of each logo. CHB: Chronic hepatitis B infection without liver damage; HCC: Hepatocellular carcinoma; LC: Liver cirrhosis; nt: Nucleotide; P: P value.
Sequence conservation at the aa level
The aa sequences of the haplotypes were translated from their respective nt sequences using the HBC reading frame.
Sliding window analysis of the aa haplotypes of the 38 patients by haplotype and haplotype frequency (Figure 4A) showed that the HBc protein was highly conserved throughout its sequence except for the central region (between aa 50 and 100), where conservation was slightly decreased. Two common hyper-conserved regions were detected: 1 between aa 117-120 and 1 between aa 159-167 (Figure 4B). All the aa in these regions had a conservation of around 100% (4.32 bits).
Figure 4 Information content analysis at amino acid level.
A: Sliding window analysis of the Hepatitis B core protein sequence for all 38 patients with and without consideration of relative frequency. Each point on the graph is the result of the mean information content (in bits) of the 10-amino acid in size windows, with forward displacement between them of 1 amino acid step. The purple line represents the information content of all the quasispecies haplotypes (By hpl) whereas the orange line indicates the information content considering haplotype frequency (By hpl freq). The dashed lines show the 2 common amino acid hyper-conserved regions observed, with reporting of their positions. B: Representation of amino acid hyper-conserved regions detected as sequence logos (with reporting of amino acid positions). The relative sizes of the letters in each stack indicate their relative frequencies at each position within the multiple alignments of amino acid haplotypes. The total height of each stack depicts the information content of each amino acid position, measured in bits (Y-axis); range: 0 bits (0% conservation) to 4.32 bits (100% conservation). By hpl: Analysis by haplotype; By hpl freq: Analysis by haplotype frequency; aa: Amino acid.
On analyzing aa conservation by haplotype in relation to clinical stage, the 3 groups showed a similar pattern, except for a region between aa 140 and 160, which was less conserved in the LC group compared with the CHB and HCC groups (P < 0.05, Figure 5A). Again, 3 group-specific conserved aa regions were detected: 1 in the CHB group (aa 98-103) and 2 in the LC group (aa 28-30 and 51-54, Figure 5B). All the aa in these regions had a conservation of around 100% (4.32 bits).
Figure 5 Information content analysis at amino acid level by clinical group.
A: Sliding window analysis of the Hepatitis B core protein by haplotype between the different clinical groups (HCC in blue, CHB in red, and LC in green). The green horizontal line corresponds to the region where LC group is less conserved compared to the CHB and HCC groups (P < 0.05). B: Representation of CHB- and LC-specific conserved amino acid regions as sequence logos. Positions are reported at the top of each logo. CHB: Chronic hepatitis B infection without liver damage; HCC: Hepatocellular carcinoma; LC: Liver cirrhosis; aa: Amino acid; P: P value.
nt indels and aa changes
nt indels and aa changes were identified by aligning the patients’ haplotypes with their genotype-specific consensus sequence.
In the CHB group, 8/16 patients had indels in HBC, vs 2/17 in the HCC group and 1/5 in the LC group. The indels consisted of the insertion or deletion of one nt at positions 1951 or 2085 (a thymine in 1951 and a guanine in 2085; Table 3). In all cases, a truncated HBc protein was produced. However, due to the limited number of patients, no statistical differences were observed on comparing the frequencies between the groups.
Table 3 Relative frequencies of nucleotide insertions/deletions detected.
Clinical stage (n/total)
Relative frequency (% of mutated haplotypes)
1951 (1 nt: T)
2085 (1 nt: G)
The table shows the relative frequency of insertions/deletions, together with the percentage (%) of mutated haplotypes per patient. Only patients carrying these mutations were included in the table. CHB: Chronic hepatitis B infection without liver damage; HCC: Hepatocellular carcinoma; LC: Liver cirrhosis; T: Thymine; G: Guanine; nt: Nucleotide.
On analysing the presence of aa changes, we identified the aa substitution P79Q (proline to glutamine) in the HCC group with a median (IQR) frequency of 15.82 (0-78.9) vs (0-0) in the CHB group (P < 0.05) and 0 (0-0) in the LC group (Figure 6).
Figure 6 Relative frequency of P79Q substitution in the 3 clinical groups.
Each dot represents a patient. The Bonferroni-corrected P value was calculated by Kruskal-Wallis test with posthoc Dunn multiple comparison test. (aP < 0.05). CHB: Chronic hepatitis B infection without liver damage; HCC: Hepatocellular carcinoma; LC: Liver cirrhosis; P: P value; P79Q: Proline to glutamine in position 79.
The HBc protein, encoded by the HBC gene, is a key element in viral replication and disease progression and is involved in both structural and functional processes. Studying gene and protein sequences in patients with different clinical stages of HBV infection could provide important information on the pathogenic role of this protein. Moreover, the identification of hyper-conserved regions at both nt and aa levels could help develop new therapeutic approaches, including gene therapy. In this study, we used NGS to analyse HBC quasispecies in a group of patients with chronic HBV infection stratified by liver disease stage.
First, we studied quasispecies conservation to search for hyper-conserved nt and aa regions regardless of clinical stage or viral genotype. Current treatment based on nucleos(t)ide inhibitors does not affect cccDNA levels or transcriptional activity and therefore cannot eliminate HBV infection. This viral mini-chromosome supports the continuous expression of viral antigens that possibly contribute to disease progression, even in the presence of drug-induced viral suppression.
New therapeutic approaches are thus required to control HBV expression, and the targeted delivery of siRNA is one of the most promising approaches under investigation. Several siRNAs are currently being tested against X and S ORFs. A study conducted in chimpanzees showed that multiple injections of ARB-1467 (a mixture of 3 interfering RNAs targeting both X and S ORFs) led to a 90% reduction in HBsAg levels and a 50% reduction in cccDNA within 28 d of treatment. None of the molecules currently available, however, target HBC, which considering its role in viral replication could be a valuable target for siRNA-based therapies.
In this study, we analysed quasispecies conservation of the entire HBC gene in patients infected by different HBV genotypes and with different clinical stages of disease in order to identify hyper-conserved regions that might be useful for pangenotypic and panclinical RNA silencing strategies. On analyzing nt conservation for the group of 38 patients, we detected 3 shared hyper-conserved regions, namely the start codon of HBC expression (nt 1900-1929), a portion with 2 CD8 epitopes (HLA-A24 and A3303) (nt 2249-2284), and an arginine-rich portion of the CTD (nt 2364-2398). All 3 sequences could be valuables targets for a new gene silencing strategy.
At the aa level, we observed 2 common hyper-conserved regions (aa 117-120 and 159-167), which fell into the second and third hyper-conserved nt portions (nt 2249-2284 and 2364-2398 respectively). The CTD plays a key role in HBc function. It contains the 4 arginine-rich domains (RRR aa 150-152, RRR aa 157-159, RRRR aa 164-167, and RRRR aa 172-175) that guarantee adequate protein subcellular localization acting as nuclear or cytoplasmic localization signals. The second hyper-conserved aa region (aa 159-167) included one of these arginine-rich domains.
The high degree of sequence conservation observed in HBc may be indicative of its importance in protein function, positioning it as a possible target for diagnostic and therapeutic strategies. Recent studies have defined HBV core-related antigen (HBcrAg, which consists of HBc, HBeAg, and HBV p22 protein) as a promising serological viral marker, particularly for patients with low viral loads, such as treated patients and patients with chronic HBeAg-negative infection. This potential marker, however, has some limitations related to its high limits of detection (2 log IU/mL) and quantification (3-7 log IU/mL). The hyper-conserved regions observed in our study could be used as targets to improve HBc detection technology.
Aptamers are emerging as a promising diagnostic and therapeutic option for different diseases. These molecules consist of single-strand DNA or RNA with high affinity and specificity and no toxicity or immunogenicity. In vitro testing of an aptamer generated using the matrix domain of HBV (located in the large surface protein L and related to the nucleocapsid envelope) resulted in a 50% decrease in HBV titre in treated cell supernatants. In another study, an aptamer targeting HBC resulted in a reduction in extracellular HBV DNA by interfering with nucleocapsid assembly. Again, the hyper-conserved regions detected in our study could be novel targets for aptamer-based strategies that might work independently of clinical stage or HBV genotype. They could be also used to elaborate a new HBV detection system, as has been done with hepatitis C virus and syncytial viruses.
On analyzing nt and aa conservation in relation to clinical stage of HBV infection, all 3 groups showed similar patterns at the aa level, although the HBV quasispecies in the LC group was slightly less conserved (mainly between aa 140-160). At the nt level, conservation was lower in the CHB group than in the other 2 groups, largely in the 5 regions between nt 1946-1992, 2060-2095, 2145-2175, 2230-2250, and 2270-2293. This finding could be consistent with the high replication rate of HBV during this clinical stage. Moreover, the first variable region (nt 1946-1992) includes three CD8 HLA epitopes (epitopes B5101, B3501, and B0702 at nt positions 1958-1982), suggesting an attempt at immune evasion. Although the CHB group had the lowest levels of sequence conservation, we detected 2 group-specific conserved regions: aa 98-103 and nt 2306-2334. The nt region included the first 5 aa of the linker region, suggesting thus an important role for this region, which is involved in capsid assembly[35,36] and viral DNA synthesis. In the LC group we detected 2 exclusively conserved nt regions (nt 1935-1976 and 2402-2435, which would translate respectively to aa 11-25 and 167-178) and 2 exclusively conserved aa regions (aa 28-30 and 51-54). The first related regions (nt 1935-1976 and aa 28-30) included portions of HBc (aa 14-18 and aa 23-39 respectively) that are involved in capsid assembly and envelopment and virion production, highlighting the importance of these functions in LC. The second LC-specific nt region (nt 2402-2435) contained an arginine-rich domain of the CTD when translated.
The identification of group-specific conserved regions suggests different evolutionary histories that may have different effects on disease progression. Further studies, however, are needed to prove the association between these regions and different clinical stages and to investigate their role in liver disease progression.
Considering the risk and severity of disease progression, identification of prognostic factors would be of great help. A number of studies have focused on detecting aa changes possibly related to different clinical stages. The mutations T1753C and A1762T/G1764A (K130M/V131I in HBx) of basal core promoter, for example, were identified as possible prognostic markers for HCC[38,39], while HBc aa mutations F24Y, E64D, E77Q, A80I/T/V, L116I, and E180A were linked to the development of cirrhosis and HCC. In our study, one of the aa changes detected, P79Q, was exclusively observed in the HCC group. Mutations at this position have been found to be slightly associated with tumour relapse after resection. More in vitro studies are required to investigate the role of the P79Q mutation in liver disease progression.
One limitation of our study is that we were not able to include large numbers of patients with different stages of liver disease due to the limits of PCR detection. This was particularly evident in the LC group, which was very small. Larger samples are needed to confirm our results. Moreover, although the Illumina MiSeq platform offers long read lengths, they are not sufficient to cover the entire HBC gene, making it necessary to divide it into 2 partially overlapping amplicons. Nonetheless, these 2 fragments were treated as independent samples during sequencing and subsequently analysed as such.
In summary, we have identified a number of nt and aa hyper-conserved regions that could be valuable targets for new therapeutic and diagnostic strategies. The role of group-specific conserved regions in liver disease progression requires further analysis. The P79Q substitution could be a possible prognostic factor for HCC. In vitro studies, however, are required to determine whether this change might affect viral replication and to investigate associations between cellular damage and onset of HCC.
Despite the existence of effective preventive vaccines, an estimated 257 million people worldwide live with chronic hepatitus B virus (HBV) infection and more than 880000 people die due to the development of liver cirrhosis and/or hepatocellular carcinoma. Although infection can be controlled with existing treatment, eradication is currently impossible due to the persistence of covalently closed circular DNA in hepatocyte nuclei that acts as a template for viral expression. New therapeutic approaches are needed, and gene therapy has been proposed as one of the most promising options. HBV core protein [encoded by the HBV core gene (HBC)] is a structural protein with functional activity that has a key role in viral replication and disease progression. Accordingly, it could be a potential target for new therapeutic and diagnostic strategies, and its variability could be a valuable prognostic factor for disease progression.
As eradication of HBV infection is currently unachievable, new therapeutic strategies are necessary. Moreover, current treatments cannot interfere with the expression of viral proteins that can favor disease progression. Gene therapy based on silencing RNA is one of the most promising therapeutic approaches currently under investigation. The identification of hyper-conserved regions in key viral genes and proteins (such as HBC) is essential to orchestrate an effective strategy regardless of clinical stage or viral genotype.
This study aimed to identify, by next-generation sequencing, hyper-conserved regions in HBC quasispecies of patients with different clinical stages of chronic HBV infection that could be a valuable target for gene therapy. Considering the essential role of the HBC gene and its encoded protein HBV core protein in HBV infection, changes in gene and protein conservation in specific clinical groups could be determining factors in disease progression and hence serve as prognostic factors for clinical follow-up.
The HBC gene was amplified by a 3-nested PCR protocol and later sequenced by next-generation sequencing (MiSeq, Illumina, United States) in 38 HBV-monoinfected chronic patients [16 with chronic hepatitis B infection without liver damages (CHB group), 5 with liver cirrhosis (LC group) and 17 with hepatocellular carcinoma (HCC group)]. Quasispecies sequences were genotyped by distance-based discriminant analysis, and general and intergroup nucleotide (nt) and amino acid (aa) conservation was determined by sliding window analysis. The presence of nt insertion and deletions and/or aa substitutions in the different groups was determined by aligning the sequences with a genotype-specific consensus sequence.
Three nt (nt 1900-1929, 2249-2284, 2364-2398) and two aa (aa 117-120, 159-167) hyper-conserved regions shared by all the clinical groups were identified. By comparing gene and protein conservation between the different clinical groups, a similar pattern of conservation was observed, although CHB showed five nt less conserved regions (nt 1946-1992, 2060-2095, 2145-2175, 2230-2250, 2270-2293) and LC one aa less conserved region (between aa 140 and 160). Moreover, some group-specific conserved regions were detected at both nt (nt 2306-2334 in CHB and 1935-1976 and 2402-2435 in LC) and aa (aa 98-103 in CHB and 28-30 and 51-54 in LC) levels. No differences in indel frequency were observed between the clinical groups. Contrarily, we identified an aa substitution (P79Q) that was more frequent in HCC [median (interquartile range) frequency of 15.82 (0-78.9) vs 0 (0-0) for the other groups; P < 0.05 vs the CHB group].
We have identified a number of nt and aa regions that were highly conserved in the presence of different viral genotypes and clinical stages. These could be valuable targets for future pangenotypic and panclinical therapeutic and diagnostic strategies. The different clinically related conserved regions and the P79Q aa substitution could potentially be used as prognostic factors for disease progression.
Our findings could guide the creation of a new gene therapy strategy based on RNA silencing. In-depth analysis of group-specific conserved or variable regions and their role in disease progression is needed. Further in vitro studies are required to determine whether the P79Q aa substitution might affect viral replication and to investigate associations between cell damage and onset of HCC.
The statistical and bioinformatics methods used in this study were reviewed by Mercedes Guerrero-Murillo from the Microbiology Department at Vall d’Hebron Hospital (Barcelona, Spain) and by Dr. Josep Gregori from the Liver Disease Viral Hepatitis Laboratory of Vall d’Hebron Hospital (Barcelona, Spain), CIBERehd research group, and Roche Diagnostics SL.
Manuscript source: Invited manuscript
Corresponding Author's Membership in Professional Societies: European Association for the Study of the Liver (14916).
A distance approach to discriminant analysis and its properties. In: Mathematics preprint series. Barcelona: 1991.
[PubMed] [DOI][Cited in This Article: ]
Distance analysis in discrimination and classification using both continuous and categorical variables. In: Statistical AData analysis and Interference. Amsterdam, 1989: 459-473.
[PubMed] [DOI][Cited in This Article: ]