This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: Yu F proposed the idea; Yi PS and Hu CJ wrote the manuscript; Yi PS and Hu CJ contributed equally to this work; Li CH performed the electronic searching and abstracted the data.
Supported byProject of Science and Technology Department of Sichuan Province, No. 2018JY050; and Nanchong Science and Technology Bureau Project, No. 18SXHZ0336.
Conflict-of-interest statement: The authors declare that they have no competing interests.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Fei Yu, MD, Professor, Department of Radiology, Yingshan County People’s Hospital, No. 100 Danan Street, Yingshan County, Nanchong 610041, Sichuan Province, China. email@example.com
Received: January 6, 2021 Peer-review started: January 6, 2021 First decision: February 14, 2021 Revised: February 25, 2021 Accepted: March 15, 2021 Article in press: March 15, 2021 Published online: April 28, 2021
Hepatocellular carcinoma (HCC) is the most commonly diagnosed type of liver cancer and the fourth leading cause of cancer-related mortality worldwide. The early identification of HCC and effective treatments for it have been challenging. Due to the sufficient compensatory ability of early patients and its nonspecific symptoms, HCC is more likely to escape diagnosis in the incipient stage, during which patients can achieve a more satisfying overall survival if they undergo resection or liver transplantation. Patients at advanced stages can profit from radical therapies in a limited way. In order to improve the unfavorable prognosis of HCC, diagnostic ability and treatment efficiency must be improved. The past decade has seen rapid advancements in artificial intelligence, underlying its unique usefulness in almost every field, including that of medicine. Herein, we sought and reviewed studies that put emphasis on artificial intelligence and HCC.
Core Tip: We performed electronic searching in PubMed, Web of Science and EMBASE. Artificial intelligence (AI) or in-depth learning and hepatocellular carcinoma were used as mesh terms. We found that AI showed favorable results in early diagnosis and treatment response prediction and prognosis estimation in patients with hepatocellular carcinoma. The past decade has seen rapid advancements in AI, underlying its unique usefulness in almost every field, including that of medicine. Herein, we sought and reviewed studies, and we expect that AI will be an important complement to traditional diagnosis, treatment and prognosis estimation of hepatocellular carcinoma.
Citation: Yi PS, Hu CJ, Li CH, Yu F. Clinical value of artificial intelligence in hepatocellular carcinoma: Current status and prospect. Artif Intell Gastroenterol 2021; 2(2): 42-55
According to GLOBOCAN 2018, liver cancer was the sixth most commonly diagnosed (4.7%) type of cancer and the fourth leading cause (8.2%) of cancer-related mortality. It has been estimated that there are approximately 841000 new liver cancer cases and 782000 liver cancer-related deaths annually. Hepatocellular carcinoma (HCC) accounts for the majority of primary liver carcinoma. The widely accepted risks of HCC include chronic hepatitis B virus/hepatitis C virus infection, alcohol consumption, cirrhosis, aflatoxin intake as well as nonalcoholic fatty liver disease. Due to its atypical radiological appearance and the possibility of false-negative biopsy results, early-stage HCC is likely to be missed. Only a few HCC patients are suitable for radical resection, and even fewer can receive a liver transplant due to the limited availability. The high recurrence rate of HCC also undermines the benefits of surgery. Patients in intermediate and advanced stages can only benefit from noncurative treatments, including transarterial chemoembolization (TACE), radiofrequency ablation (RFA), targeted agents and systemic therapies, albeit in a limited way. Managing HCC is a major challenge in the clinic.
In the past few years, rapid progress has been made in artificial intelligence (AI) due to improvements in computer science. AI techniques, including machine learning (ML), artificial neural networks (ANNs) and computer vision, were combined with surgery, radiology, bioinformatics and pharmaceuticals and played an innovative role in boosting the development of those techniques[3,4]. At present, AI is applied in drug design, patient monitoring, diagnostics and imaging, risk prediction and management, wearables and virtual assistants.
As AI is now frequently used in diagnosis, treatment and patient managing of many types of cancer, including lung, gastric, prostate and colon cancers[6-17], the assistance of AI in enhancing our diagnostic, therapeutic and prognostic ability to control HCC was not unexpected. In addition, the combination of AI and big data also performed much better than traditional methods.
Recent studies have exhibited promising applications of AI in HCC. In the present study, the latest developments in the use of AI in HCC were studied, and both methods and improvements were reviewed.
DIAGNOSTIC ASSISTANCE FROM AI
An HCC diagnosis is based mostly on imaging and laboratory tests. Radiological and nonradiological imaging holds a dominant position in the diagnosis, staging, therapeutic decisions and management of patients, while laboratory biomarkers [e.g., α-fetoprotein (AFP)] offer some support. For certain patients, histological examination is recommended. By introducing AI into the evidence-based diagnostic procedure, more accurate classification was provided to assist clinical determination. Recent developments were summarized in Table 1.
Table 1 Recent developments in artificial intelligence assisted diagnosis.
Preoperative serum AFP, tumor number, size and volume
The ANN showed higher AUCs in identifying tumor grade (0.94) and MVI (0.92)
1D: One-dimensional; 2D: Two-dimensional; AFP: α-fetoprotein; AI: Artificial intelligence; ANN: Artificial neural network; AUC: Area under the curve; CNN: Convolutional neural network; CT: Computed tomography; DL: Deep learning; DWI: Diffusion-weighted imaging; HE: Hematoxylin and eosin; LR: Logistic regression; ML: Machine learning; MRI: Magnetic resonance imaging; MVI: Microvascular invasion; SVM: Support vector machine; RF: Random forest.
In a study in 2010, a total of 250 HCC patients, including 200 patients who underwent hepatectomy and 50 who underwent liver transplantation, were randomly divided into a test group (n = 75; 30%) and a training group (n = 175; 70%). Factors including serum AFP, preoperative tumor number, maximum tumor size and tumor volume were found by univariate analysis to be strongly related to tumor grade and/or microvascular invasion. Those four factors were used to build both a traditionally used logistic regression (LR) model and an ANN, which was set as a 3-layer feedforward neural network with a learning rule of backpropagation of error, endowing the ANN with a capacity of reducing overall error. It was clear that ANN [area under the curve (AUC) = 0.94; 95% confidence interval (CI): 0.89-0.97] had a notably higher (P < 0.001) predictive ability for tumor grade than LR analysis (AUC = 0.85; 95%CI: 0.78-0.89). At the same time, its ability to predict microvascular invasion was also significantly stronger (AUC = 0.92, 0.85; 95%CI: 0.86-0.96, 0.74-0.89; P < 0.001). Compared with single factor prediction, which cannot effectively predict tumor grade and microvascular invasion[21-23], ANN provided a significantly improved ability to stratify tumors in a multidimensional way.
Magnetic resonance imaging (MRI) is highly valued in clinical diagnosis due to its outstanding ability to locate lesions. Recent research has shown the potential of deep-learning systems to distinguish HCC from other hepatic diseases, in which all 494 typical imaging features of six types of hepatic lesions were divided into a training set (n = 434) and a test set (n = 60). An AI model was used to classify hepatic lesions through multiphasic contrast-enhanced MRI scans. A custom convolutional neural network (CNN) with iteratively optimized architecture was trained by 43400 samples generated from 434 patients of the training set via augmentation techniques. The test set included 60 lesions (10 lesions from each category) randomly selected by Monte Carlo cross-validation. Eventually, the CNN consisted of three convolutional layers for generating filtered images, two maximum pooling layers for providing spatial invariance and two fully connected layers for outputting matched lesion types. As a result, a 90% sensitivity and an AUC of 0.992 for HCC classifying were observed in the test set, with an average 90% sensitivity and 98% specificity for a total of six classes of lesions. It had comparable efficiency to traditional multiphase MRI, which was reported to have an overall sensitivity of 89% and specificity of 96% for HCC.
Another recent study, in which imaging data was partitioned into a training and validation set (60 HCCs) and a fixed test set (40 HCCs), paid attention to the tumor grading potential of diffusion-weighted imaging. An AI model was constructed based on an open-source deep-learning framework, “caffe”, to grade HCC by diffusion-weighted imaging. Edmondson grade I and II HCCs were defined as low-grade (n = 47), while Edmondson grade III and IV HCCs were defined as high-grade (n = 53). Diffusion-weighted imaging was performed with three sets of b-values (0, 100, 600 s/mm2), logarithmically transformed into log maps and then extracted by a specifically designed two-dimensional CNN to collect spatially deep features for grading tumors. The two-dimensional CNN was established with two convolutional layers, two pooling layers, two fully connected layers and a softmax layer. A deeply supervised loss functioned as the cross-entropy loss of the proposed CNN, which combined the three loss functions of CNN in the three b-value images and the loss function of the concatenated deep features. In terms of grading accuracy, the proposed CNN (80%; AUC, 0.83) performed better than other CNNs derived from original b 0 (65%), b 100 (68%), b 600 (70%) images and an apparent diffusion coefficient map (72.5%).
Jian et al reported a novel method of training a deep-learning HCC diagnosis model with nonenhanced MRI scans. A total of 112 HCC patients (115 HCC tissue samples) with histological HCC proofs and enhanced MRI scans (including precontrast phase, arterial phase, portal vein phase and delayed phase) were classified into four Edmondson grades and further defined as low-grade (Edmondson grades I and II) and high-grade (Edmondson grades III and IV) HCCs. A deep-learning framework was established in two steps. The first step was the pretraining process, in which the relationship between precontrast (nonenhanced) and enhanced MRI scans was identified in order to find out malignant characterizations of nonenhanced MRI scans. The identified characterizations were transferring-learnt using a supervised cross modal method in the second step. Results showed that the CNN-based method performed better in characterization than the traditional way, and the deeply supervised model pretrained by the cross modal from the three phases (precontrast, arterial and portal vein phase) performed the best compared with nonsupervised CNN and deeply supervised methods pretrained by the cross modal from two out of three phases (precontrast + arterial phase and precontrast + portal vein phase). This result revealed a new diagnostic approach for patients not receptive to enhanced imaging.
A deep-learning automatic segmentation model was built on multiphase computed tomography (CT) images to discriminate tumors from healthy liver tissue and further identify between active and necrotic tumor areas. A total of 13 contrast-enhanced CT sequences from 7 HCC patients were manually segmented by four experts into 104 labeled CT scan slices, containing images captured before contrast agent injection and images reflecting the arterial phase and the portal venous phase. The U-Net architecture was configured in a hierarchical method to specially segment by applying separate networks for each type of specific tissue. Two opposite strategies were investigated: Dimensional MultiPhase strategy, in which single-phase images were processed in a multi-dimensional feature map and the MultiPhase Fusion strategy, in which each phase was independently processed and then merged into the final segmentation. The softmax was introduced in the final layers of the different networks. The weighted cross-entropy functioned as the cost to optimize the weights and balance classes problem. Finally, a commonly used Dice similarity coefficient was used to estimate segmentation quality. Results indicated a better competency of multiphase methods in segmenting the liver and active part of tumors as compared with single phase ones. Between the two multiphase methods, Dimensional MultiPhase outperformed MultiPhase Fusion in the segmentation of the liver (P = 0.004) and active part of the tumors (P = 0.005). Furthermore, the combination of two Dimensional MultiPhase methods displayed the highest ability in spotting active areas from tumor tissues, making it reliable (mean error rate = 13.0%) in estimating the necrosis rate in which traditional CT estimation is not[29,30]. With a more accurate assessment method, more beneficial clinical decisions may be made.
Histological examination provides solid evidence for the diagnosis, grading and prognosis analysis of HCC. Hematoxylin and eosin staining is the most common method used for biopsy. A total of 491 whole-slide hematoxylin and eosin-stained histopathological images of HCC and adjacent normal tissues downloaded from the Genomic Data Commons data portal were used for supervised training of ML classifier based on Breiman’s random forest (RF). The 31 most valuable image features (IFs) identified from the training set by principal component-based analysis (PCA) were used during the establishment of the classification model. An external validation set of tissue microarray images from the West China Hospital was employed in addition to the randomly partitioned training (70%) and test (30%) sets. The IF classification model showed an AUC of 0.988 (95%CI: 0.975-1.000) in the test set, while that of the external validation set was 0.886 (95%CI: 0.844-0.929). This outstanding performance of the IF model indicates its possible applications in the future.
Hyperspectral imaging (HSI) was regarded as a promising diagnostic technique. A one-dimensional CNN was designed to discriminate HCC from normal tissues through HSI images. HCC samples were cut into two adjacent slices, one of which was hematoxylin and eosin-stained and the other one underwent HSI. A total of 14 sets of HSI images, each containing 107 images photographed under different wavelengths, were used in a leave-one-out cross-validation approach, resulting in 14 different models. The framework consisted of a convolution layer, a max-pooling layer and a fully connected layer. The convolution layer could extract features from HSI images supervised by annotated tumor areas on the paired hematoxylin and eosin-stained slice, with a rectified linear unit that was shown to avoid gradient vanishing and accelerate the training process. Extracted features were processed in the max-pooling layer to reduce dimension and classified afterward in the fully connected layer. The average accuracy, sensitivity, specificity and AUC of those models was 0.881, 0.871, 0.888 and 0.950, respectively. Further evaluation was carried out and exhibited a salient capacity of the one-dimensional CNN model as compared with the RF and support vector machine (SVM) models.
Information was extracted from 539 HCC patients and 1043 non-HCC patients to train and test a predictive ML framework developed using R version 3.4.3 and the Shiny and Caret packages. Patients were randomly divided into the training (80%), development and test sets. Clinical information, including AFP, AFP-L3, des-g-carboxy prothrombin (commonly referred to as DCP), aspartate aminotransferase, alanine transaminase, platelet count, alkaline phosphatase, gamma-glutamyl transferase, albumin, total bilirubin, age, sex, height, body weight, hepatitis B surface antigen and hepatitis C virus antibody, was obtained for ML. The framework had several classifiers and two components. In the first component, a grid search was performed to select the best classifier and its specific hyperparameter, which would be introduced in the second component to output probabilities of HCC. Among a total of seven classifiers, gradient boosting showed an AUC of 0.940 as the highest one, with that of the optimal, based on the framework, classifier at 0.943; single-factor prediction using thresholds of 200 ng/mL for AFP, 40 mAu/mL for DCP and 15% for AFP-L325 performed AUCs of 0.766, 0.644 and 0.683, respectively.
THERAPY RESPONSE PREDICTION BY AI
Surgical resection remains the first-line treatment for early-stage patients, with 5-year survival in appropriately selected cases exceeding 70%. However, it has been reported that HCC diagnosis is usually delayed, especially in countries with limited screening resources. Out of patients who miss the optimum surgical time window or are unsuitable for operative therapy, only a few benefit from loco-regional (e.g., RFA), intra-arterial (e.g., TACE), systemic and targeted therapies. Thus, enhancing the accuracy of surgical indications and promoting treatment benefits of nonoperative therapies would effectively improve the clinical prognosis of patients. In the past years, some AI models with great potential were built, as referred in Table 2.
Table 2 Artificial intelligence models that can help in predicting therapy responses.
Cox-identified risk factors
The ANN had the highest AUC (0.855)
Cox model, TNM 6th, BCLC and HPBA system (0.826, 0.639, 0.612, 0.711)
HCC has been estimated as the fourth highest cause of all cancer-related mortality worldwide, indicating a high malignancy and poor prognosis of HCC. Accurate prognostic prediction of tumor resection is needed to identify high-risk patients and enable more favorable clinical decisions. As Qiao et al reported, the independent risk factors (including tumor size, number, AFP, microvascular invasion and tumor capsule) found by linear regression to be significantly related to survival were selected to assist in predicting the prognosis of early HCC after partial hepatectomy, both in a Cox model and using an ANN method. A feed-forward neural network was built as a perceptron with several layers, outputting a prognosis condition (survival or death) for certain time points. In addition to the training and cross-validation cohort in which patients from the Eastern Hepatobiliary Surgery Hospital were randomly selected, an external validation cohort was obtained from the First Affiliated Hospital of Fujian Medical University. AUCs demonstrated that the ANN (0.855) outperformed the Cox model (0.826), Tumor, Node, Metastasis 6th (0.639), Barcelona Clinic Liver Cancer (BCLC) (0.612) and HepatoPancreato-Biliary Association system (0.711), and consistent results were observed in the external validation cohort. It drew attention to the potential of the ANN model to provide clinical assistance and improve benefits of early-stage HCC patients.
AI models can also help identify predictive factors of surgery outcomes. In a multicenter retrospective study that included 976 BCLC 0-B HCC patients who underwent hepatectomy, Tsilimigras et al generated homogeneous groups of patients based on their 5-year overall survival (OS) and identified clinical factors, which can be used to predict OS after resection using the nonparametric Classification and Regression Tree (CART) model based on pre- (preoperative CART model) and postoperative (postoperative CART model) factors. CART is a risk prediction model with a performance to recursively partition the ‘covariate space’. As a result, the CART model successfully identified several prognosis predictive factors. Among BCLC-0/A patients, the CART model selected AFP and Charlson comorbidity score as the first and second most important preoperative factors and lymph vascular invasion as the best postoperative predictor of OS. Radiological tumor burden score and pathologic tumor burden score were selected as the best pre- and postoperative factors for predicting surgical outcomes for BCLC-B HCC patients.
Consecutive studies of Ho et al[37,38] have been reported in which AI models were predictively capable of classifying patients into different groups with distinctive disease-free survival (DFS) and OS after hepatic resection. Data from HCC patients who underwent liver resection were examined and merged for further construction of survival predictive models. The input variables were identified by the univariate Cox proportional hazard model to be closely related (log-rank test; P < 0.05) to DFS or OS. Eighty percent of the data were used for training, and the other 20% for validation, while no significantly different effect of input variables was observed between training and validation (P > 0.05). The proposed ANNs in both studies, which shared homologous structures based on the Waikato Environment for Knowledge Analysis software using a backpropagation algorithm, were framed with input, hidden and output layers. Each of the identified variables was inputted into one of the input neurons, and then a trial-and-error process was performed in the hidden layer to optimize its neuron numbers before generating DFS and OS status in the output layer, which contained only one neuron.
In the first reported study showing the capacity of the ANN to predict DFS based on 15 statistically significantly associated variables, two comparative models were tested: An LR and a decision tree model. The receiver operating characteristics curves and AUCs for the 1-, 3- and 5-year DFS models constructed using ANN, LR and decision tree demonstrated an acceptable and exceeding performance of the ANN model as compared with the LR and decision tree models.
In another study, attention was paid to OS after resection with 21 potential variables serving as inputs. An LR model was used for performance comparison. The accuracy, sensitivity, specificity and AUC of the ANN and LR models were calculated. As a result, the prediction performance of the ANN model was significantly stronger than that of the LR model. In both studies, the possible usage of the ANN as a clinical supplementary tool for decision-making was emphasized, suggesting it might be able to enhance the profit-risk ratio of HCC resection.
TACE has been widely accepted as the standard and effective treatment for HCC patients at the intermediate stage. Recent studies have paid considerable attention to deep-learning and TACE, highlighting treatment response prediction and AI-assisted clinical decision-making.
Contrast-enhanced ultrasound (CEUS) and B-mode ultrasound images of 130 HCC patients who received first-time TACE treatment were obtained for retrospective analysis using AI, which was trained to predict patient response (objective-response and nonresponse) to TACE. A total of three models were framed by applying CEUS images (deep-learning radiomics-based CEUS model), the time-intensity curve of CEUS (ML radiomics-based time-intensity curve of CEUS model) and B-mode images (ML radiomics-based B-Mode images model). AUCs were compared between the three models, and the hepatoma arterial-embolization prognostic score was used to predict the outcomes of patients with HCC undergoing TACE. In the training (n = 89; 68.5%) and validation (n = 41; 31.5%) cohorts, the three models markedly outperformed the hepatoma arterial-embolization prognostic score [AUC = 0.98 (0.92-0.99), 0.84 (0.74-0.90), 0.82 (0.73-0.91) and 0.623 in the training and 0.93 (0.80-0.98), 0.80 (0.64-0.90), 0.81 (0.67-0.95) and 0.617 in the validation cohorts for deep-learning radiomics-based CEUS model, ML radiomics-based time-intensity curve of CEUS model, ML radiomics-based B-Mode images model and hepatoma arterial-embolization prognostic score, respectively]. A high reproducibility of this predictive accuracy was displayed by robustness experiments performed in triplicate in both the training and validation cohorts. The predictive capability of human readers with a deep-learning feature map showed an advantage over that of ML radiomics-based time-intensity curve of CEUS model or ML radiomics-based B-Mode images model but not over that of deep-learning radiomics-based CEUS model.
In two analogous studies, the ML network displayed a strong ability to predict TACE therapy outcomes using CT images. Peng et al trained a pretrained deep CNN, ResNet50, with manually segmented CT images to predict treatment response to TACE. Tumor regions of interest segmented by experienced radiologists were divided into one training set (n = 562) and two validation sets (n = 89; 138). The weights of earlier layers (1-174) in this network were frozen to prevent overfitting and speed up the training process. The trained model showed AUCs of 0.97 (0.97-0.98), 0.96 (0.96-0.97), 0.95 (0.94-0.96) and 0.96 (0.96-0.97) in the training cohort (n = 562), 0.98 (0.97-0.99), 0.96 (0.95-0.98), 0.95 (0.93-0.98) and 0.94 (0.90-0.98) in the validation cohort 1 (n = 89), and 0.97 (0.96-0.98) and 0.96 (0.94-0.98), 0.94 (0.92-0.97), 0.97 (0.95-0.98) in the validation cohort 2 (n = 138) for complete response, partial response, stable disease and progressive disease, respectively. Morshid et al built a fully automated ML algorithm that can predict response to TACE using quantitative CT scan features and BCLC stage. A total of 105 HCC patients who had received TACE were defined by time to progression as TACE-susceptible (time to progression ≥ 14 wk) or TACE-refractory (time to progression < 14 wk). A total of five imaging features that were different between background liver and tumor were extracted, including tumor volume, maximum two-dimensional axial diameter of the background liver, small area low gray-level emphasis within the background liver, maximal correlation coefficient within the background liver and long-run high gray-level emphasis within the tumor. Those features were added to the AI model to promote prediction accuracy. Compared with the model based on the BCLC stage alone (prediction accuracy = 62.9%, 95%CI: 0.52-0.72), the model based on CT scan features and BCLC stage showed a better prediction accuracy of 74.2% (95%CI: 0.64-0.82).
Abajian et al established an LR and an RF model to predict TACE treatment response using MRI scans. The quantitative European Association for the Study of the Liver response criteria were used to measure TACE response. A total of 36 patients were defined as treatment responders (8/36; 22.2%) and nonresponders (28/36; 77.8%) using a cut-off value of 65% changes in quantitative European Association for the Study of the Liver response criteria. During the training process of both models, five features, including cirrhosis, pre-TACE tumor signal intensity, pre-TACE number of tumors, performing method of TAC and existence of sorafenib treatment, were used in 30 different combinations to identify the most accurate predictive model. A leave-one-out cross-validation method was used for a predictive accuracy test. When trained on all five features, the LR model displayed an accuracy of 72.0%, sensitivity of 50.0% and specificity of 78.6%, while an accuracy of 66.0%, sensitivity of 62.5% and specificity of 67.9% were validated for the RF model. Notably, these two models shared a best performance (accuracy 78%, sensitivity 62.5% and specificity 82.1%) when trained using only two (pre-TACE tumor signal intensity > 27.0 and presence of cirrhosis) of those five features but still remained inferior to that of MR scan using a baseline apparent diffusion coefficients value threshold of 0.83 × 10-3 mm2/s, which demonstrated 91% sensitivity and 96% specificity to predict TACE response at 1 mo after treatment and an AUC of 0.965.
RFA is considered a viable option for HCC patients who are unsuitable for resection or on the waiting list for a liver transplant. A prognostic prediction ANN model was reported to be promising for clinical practice. Patients were divided into a 1- (n = 252) and a 2-year (179) DFS group. A total of eight and six variables from a total of fifteen potential variables (total bilirubin, aspartate aminotransferase, alanine transaminase, albumin, platelet, age, gender, tumor size, tumor number, AFP, HCC treatment history, TACE, recurrence events after TACE, BCLC stages and liver cirrhosis events) were found to be significantly associated with 1- and 2-year DFS and were used as inputs for building prediction models, which was based on a multiple-layer perceptron structure and a backpropagation learning rule. This ANN model was designed with the ability of selecting structure depending on its predictive performance. Between two 1-year DFS models, the one built with 15 features (the accuracy, sensitivity, specificity, and AUC were 0.92, 0.87, 0.94 and 0.94, respectively) was better than the one with 8 significant features (the accuracy, sensitivity, specificity and AUC were 0.78, 0.37, 0.96 and 0.80, respectively). Consistently, a 2-year DFS model with 15 features (the accuracy, sensitivity, specificity and AUC were 0.86, 0.79, 0.91 and 0.88, respectively) showed a considerable advantage over that with 6 significant features (the accuracy, sensitivity, specificity and AUC were 0.68, 0.47, 0.84 and 0.76, respectively) and traditional methods including acoustic radiation force impulse elastography (AUC = 0.821; 95%CI: 0.747-0.895) and transient elastography(AUC 0.793; 95%CI: 0.712-0.874)[46,47]. Although some of the 15 features were evaluated by χ2 test to be nonsignificantly related with 1- or 2-year DFS, the better outcome of models with all 15 features might have prompted their implicit roles in RFA response prediction.
PROGNOSIS ESTIMATION USING AI
In order to correctly identify the development characteristics and improve the outcomes of existing therapies, accurate prognostic information is indispensable. Individualized precise treatment based on risk and prognostic data would substantially enhance curing efficiency in HCC. Table 3 displayed some of the effective models which can provide prognosis estimation.
Table 3 Prognosis prediction models built with artificial intelligence algorithms.
DL algorithms CHOWDER and SCHMOWDER
Whole-slide digitized histological slide
C-indexes for survival prediction of SCHMOWDER and CHOWDER reached 0.78 and 0.75
AI: Artificial intelligence; ANN: Artificial neural network; AUC: Area under the curve; DL: Deep learning; HALT-C: Hepatitis C antiviral long-term treatment against cirrhosis; HCC: Hepatocellular carcinoma; LR: Logistic regression; ML: Machine learning; OS: Overall survival; PCA: Principal component-based analysis; RFE: Recursive feature elimination; SVM: Support vector machine; TCGA: The Cancer Genome Atlas.
Two deep-learning algorithms, CHOWDER and SCHMOWDER, which adopted whole-slide digitized histological slides of HCC patients that had undergone surgery were set up to predict OS after resection. CHOWDER could automatically recognize survival-related patterns on the tiles derived from the slides and assess the risk score for each whole-slide digitized histological slide in three steps: Preprocessing, tile-scoring and prediction. SCHMOWDER has an identical preprocessing step as CHOWDER and a two-branch tile-scoring and predicting pipeline. The upper branch, which generated a representation of highly-probably tumoral tiles with an attention mechanism used, was trained by annotations from pathologists; the lower branch, which generated a representation of only a few tiles, was weakly supervised. Representations from the two branches were merged to calculate a survival risk score. The discriminatory capacities of the two models assessed by cross-validation were demonstrated as better than baseline factors (including microvascular invasion, serum AFP, largest nodule diameter and satellite nodules) and composite score by combining survival-related clinical, biological and pathological features.
In a prospective study including 442 patients with Child A or B cirrhosis, an HCC development prediction model based on ML algorithms, known as RF, was compared using conventional regression analysis. Previously determined clinically relevant parameters (age, body mass index and presence of diabetes) and those identified by univariate analysis (AFP level, bilirubin, male gender, aspartate aminotransferase, alanine transaminase, Child-Pugh score and viral etiology) were selected to build a predictive regression model and an ML classifier. Multiple decision trees were constructed and used as “votes” to create the final classification prediction model. Cross-validated accuracy estimation and external validation in the hepatitis C antiviral long-term treatment against cirrhosis trial cohort, which included 1050 patients, was conducted. The ML algorithm performed the best classifying characteristics with a c-statistic of 0.64 (95%CI: 0.60-0.69) compared with the regression model (0.61; 95%CI: 0.56-0.67) and the model built on the hepatitis C antiviral long-term treatment against cirrhosis cohort (0.60; 95%CI: 0.50-0.70), raising the possibility of prospectively predictive HCC development by ML.
Two HCC subgroups were found to have a notably discrepant prognosis by survival analysis and were focused on to build a deep-learning survival prediction model. RNA, miRNA and methylation data from 360 HCC patients were collected from The Cancer Genome Atlas (TCGA) and were split to train an SVM model. Five additional confirmation datasets were obtained to estimate the predictive accuracy. TCGA HCC omics data were regarded as the input of the proposed autoencoder, in which three hidden layers with different numbers of nodes were implemented using the Python Keras library. The autoencoder was trained for ten epochs with a 50% dropout in the gradient descent algorithm. A total of 37 features of the TCGA omics data significantly (log-rank test, P < 0.05) associated with survival were identified by the autoencoder. With those features, a classification model using the SVM algorithm was built and validated in the test group and five additional groups of HCC patients. C-index, Brier score and log-rank test were carried out to evaluate the performance of the AI model, and two alternative methods, including PCA and a model based on 37 manually identified features from the omics data. The proposed model showed a clearly better potential than that of PCA and the model with manually-inputted features, and intended prediction robustness was validated in additional datasets.
Anomalous DNA methylation was found to be highly related to HCC[52,53] and able to predict survival in HCC patients that had undergone surgery. DNA methylation data from 377 HCC samples and 50 adjacent normal tissue samples were obtained and analyzed using the ChAMP tool in R software. A total of 2785 sites from 40799 sites that had been methylated differently between HCC tissue and adjacent normal tissue were assessed via Cox regression and found to be significantly related to OS (P < 0.05). The SVM-recursive feature elimination algorithm behaved as a classifier to identify valuable sites that could be used to build a predictive model. Finally, 134 methylation sites were used to build the predictive model. A total of 163 patients were divided into a “high-risk” (died within 1 year after surgery, n = 58), “intermediate-risk” (survived 1-5 years after surgery, n = 64) and “low-risk” (survived > 5 years after surgery, n = 41) groups and were separated into a training (n = 130) and a test (n = 33) set. A total of 26 (78.8%) patients were successfully classified into the test set. Further validation of 19 paired HCC and normal tissue samples from the GSE77269 dataset in the Gene Expression Omnibus database demonstrated no incorrect classification of normal tissues and a similar ratio of HCC samples classified as “high-risk.” Although this algorithm showed a higher accuracy of classifying HCC patients than some traditionally-set classifying methods based on DNA methylation[55,56], validation in a larger sample size was needed.
Liao et al built an IF-based prognosis prediction model (IF model) that can divide HCC patients who underwent resection into two groups, the high- and low-score groups, with a different OS according to the cut-off value of the training set. A total of 46 informative IFs, identified by Cox proportional hazard regression and an RF minimal depth algorithm, were found to be significantly (P < 0.05) associated with OS and were used to train the IF model. As a result, the IF model successfully distinguished patients with higher scores from those with lower scores in all three sets (log-rank test; P < 0.0001 in the training set, P = 0.013 in both the test and external validation sets), exhibiting a well-performed prognosis prediction ability. Furthermore, time-dependent receiver operating characteristics curves were used to compare the prognosis performance between the IF model and the Tumor, Node, Metastasis staging system, with no significant difference observed (adjusted P = 0.848-1.000) at each time point (1-9 years after treatment), indicating that the IF model may have a comparable predictive accuracy with that of the Tumor, Node, Metastasis staging system.
Two similarly framed ANN models, expected to respectively predict in-hospital and 5-year mortality in HCC, were trained with data from a large population of 22926 patients who had been diagnosed with HCC and had undergone resection[57,58]. The structure of ANNs consists of an input layer, a hidden layer and an output layer. To identify related variables, continuous and categorical variables were respectively tested by one-way analysis of variance and Fisher’s exact test, and significant predictors (P < 0.05) were verified by univariate analysis. The following steps were repeated 1000 times: (1) Data were randomly divided into a training set (n = 18341; 80%) and a test set (n = 4585; 20%); (2) the LR and ANN models were established based on the training dataset; and (3) Paired t-tests were used to compare indices between the two models. Statistically in-hospital mortality-related variables, including age, gender, comorbidity (estimated by Charlson comorbidity index), hospital volume, surgeon volume and length of stay) were extracted by the ANN, and an outcome (death/survival) was generated. Compared to the LR model, the ANN showed a substantial advantage with a higher accuracy rate (97.28 vs 88.29, P < 0.001), a lower Hosmer-Lemeshow statistic (41.18 vs 54.53, P < 0.001) and a higher AUC (0.84 vs 76, P < 0.001). The other ANN model was built and tested similarly with six identical variables to predict 5-year mortality, and ANN was found to significantly outperform the LR model (accuracy rate 96.57% vs 87.96%; Hosmer-Lemeshow statistic 0.34 vs 0.45; AUC 88.51% vs 77.23%). Those two models combined with the deep-learning technique showed unique prognosis prediction performance, revealing their possible applicability in the prediction of in-hospital and long-term mortality.
OMICS RESEARCH PERFORMED WITH AI
Genomic data have exhibited efficient and unique advantages in both research and clinical experience. A recent study managed to correlate tumor samples and their original tissue types using an ML prediction model. RNA-seq data of 14 tumors and at least 10 corresponding adjacent normal tissue samples for each tumor were downloaded from TCGA, Therapeutically Applicable Research to Generate Effective Treatments and the Genotype-Tissue Expression. An autoencoder neural network based on Pytorch with a rectifying activation function, dropout and normalization between layers was built. The mean squared error between the input and output was introduced as the loss function. After 10000 iterations for converging loss, the autoencoder demonstrated an outstanding ability to identify tissue sites for cancers with increasing accuracy in parallel with the mounting number of varying genes, noticeably surpassing the predominant PCA method, which identified only 8/14 cancers. In the distinction of HCC samples, the autoencoder with all features utilized showed a highly specific capacity of capturing biological information. This study provided a solid reference for further research in HCC and might be able to promote sample usage in a precise way.
A novel approach of seeking HCC-related genes by ML was established. Gene expression profiles of 43 tumor and 52 normal tissue samples were downloaded from NCBI Gene Expression Omnibus. A maximum relevance-minimum redundancy (mRMR) method, referred to as mRMRe, was used to rank the features. The mRMR is a proven ML approach for phenotype classification; it can classify transcriptional features based on both the redundancy between features and their relevance to the target. An incremental feature selection method was combined with the mRMRe algorithm, generating a possible feature subset for further analysis. A subset consisting of 117 features with a satisfying accuracy of 0.895 was finally selected as the criteria to distinguish HCC from non-HCC samples, in which several previously identified HCC-related genes (such as MT1X, BMI1 and CAP2) were found, justifying the rationality of this model. Furthermore, some genes, such as TACSTD2, that were not considered to be HCC-related before (one of which was identified by protein-protein interaction) might be crucial during the pathogenesis of HCC, namely ubiquitin C was identified by this model.
AI showed a substantial enhancement throughout the pre- and postclinical process of HCC in terms of both investigation and treatment. Due to the low diagnostic rate of early-stage patients, its high recurrence rate and unsatisfactory treatment effectiveness, HCC is one of the deadliest types of cancer worldwide. The emerging and fast-developing techniques of AI offer the possibility of improving the survival of HCC patients. Brought by deep-learning methods, a higher accuracy of diagnosis and treatment response prediction combined with individual prognosis assessment could potentially improve the time and quality of survival for HCC patients to a considerable extent.
AI has also been used in a wider range of clinical practice. Hyer et al released an ML approach to predict postsurgical prognosis. The novel method referred to as Complexity Score outperformed several currently used indices of prognosis estimation. Mueller-Breckenridge et al identified two hepatitis B virus quasispecies by ultra-deep sequencing and developed a ML model to determine the viral variants and assist clinical decision-making with regards to anti-hepatitis B virus strategies. A newly-established ML model was reported as an alternative method in the prediction of liver fibrosis caused by chronic hepatitis C virus infection. While none of those studies were directly related to HCC, their findings might significantly help preclinical prevention, early diagnosis and surgical planning.
Jian W, Ju H, Cen X, Cui M, Zhang H, Zhang L, Wang G, Gu L, Zhou W. Improving the malignancy characterization of hepatocellular carcinoma using deeply supervised cross modal transfer learning for non-enhanced MR.Conf Proc IEEE Eng Med Biol Soc. 2019;2019:853-856.
[PubMed] [DOI][Cited in This Article: ]
Tsilimigras DI, Mehta R, Moris D, Sahara K, Bagante F, Paredes AZ, Farooq A, Ratti F, Marques HP, Silva S, Soubrane O, Lam V, Poultsides GA, Popescu I, Grigorie R, Alexandrescu S, Martel G, Workneh A, Guglielmi A, Hugh T, Aldrighetti L, Endo I, Pawlik TM. Utilizing Machine Learning for Pre- and Postoperative Assessment of Patients Undergoing Resection for BCLC-0, A and B Hepatocellular Carcinoma: Implications for Resection Beyond the BCLC Guidelines.Ann Surg Oncol. 2020;27:866-874.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 15][Cited by in F6Publishing: 11][Article Influence: 7.5][Reference Citation Analysis (0)]
Abajian A, Murali N, Savic LJ, Laage-Gaupp FM, Nezami N, Duncan JS, Schlachter T, Lin M, Geschwind JF, Chapiro J. Predicting Treatment Response to Intra-arterial Therapies for Hepatocellular Carcinoma with the Use of Supervised Machine Learning-An Artificial Intelligence Concept. J Vasc Interv Radiol 2018; 29: 850-857.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 56][Cited by in F6Publishing: 34][Article Influence: 18.7][Reference Citation Analysis (0)]
Mueller-Breckenridge AJ, Garcia-Alcalde F, Wildum S, Smits SL, de Man RA, van Campenhout MJH, Brouwer WP, Niu J, Young JAT, Najera I, Zhu L, Wu D, Racek T, Hundie GB, Lin Y, Boucher CA, van de Vijver D, Haagmans BL. Machine-learning based patient classification using Hepatitis B virus full-length genome quasispecies from Asian and European cohorts.Sci Rep. 2019;9:18892.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 5][Cited by in F6Publishing: 5][Article Influence: 2.5][Reference Citation Analysis (0)]