Ballotin VR, Bigarella LG, Soldera J, Soldera J. Deep learning applied to the imaging diagnosis of hepatocellular carcinoma. Artif Intell Gastrointest Endosc 2021; 2(4): 127-135 [DOI: 10.37126/aige.v2.i4.127]
Corresponding Author of This Article
Jonathan Soldera, MD, MSc, Associate Professor, Staff Physician, Clinical Gastroenterology, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas 1130, Caxias do Sul 95070-560, RS, Brazil. email@example.com
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: All authors contributed to study concept and design, to drafting of the manuscript and to critical revision of the manuscript for important intellectual content.
Conflict-of-interest statement: The authors have no conflict of interest to disclose.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Jonathan Soldera, MD, MSc, Associate Professor, Staff Physician, Clinical Gastroenterology, Universidade de Caxias do Sul, Rua Francisco Getúlio Vargas 1130, Caxias do Sul 95070-560, RS, Brazil. firstname.lastname@example.org
Received: April 21, 2021 Peer-review started: April 21, 2021 First decision: May 19, 2021 Revised: June 5, 2021 Accepted: July 19, 2021 Article in press: July 19, 2021 Published online: August 28, 2021
Each year, hepatocellular carcinoma is diagnosed in more than half a million people worldwide. It is the fifth most common cancer in men and the seventh most common cancer in women. Its diagnosis is currently made using imaging techniques, such as computed tomography and magnetic resonance imaging. For most cirrhotic patients, these methods are enough for diagnosis, foregoing the necessity of a liver biopsy. In order to improve outcomes and bypass obstacles, many companies and clinical centers have been trying to develop deep learning systems that could be able to diagnose and classify liver nodules in the cirrhotic liver, in which the neural networks are one of the most efficient approaches to accurately diagnose liver nodules. Despite the advances in deep learning systems for the diagnosis of imaging techniques, there are many issues that need better development in order to make such technologies more useful in daily practice.
Core Tip: Hepatocellular carcinoma is diagnosed using imaging techniques, such as computed tomography and magnetic resonance imaging. In order to improve outcomes and bypass obstacles, many companies and clinical centers have been trying to develop deep learning systems that could be able to diagnose and classify liver nodules in the cirrhotic liver. Neural networks have become one of the most efficient approaches to accurately diagnose liver nodules using deep learning systems. Therefore, with the improvement of these techniques in the long term, they could be applicable in daily practice, modifying outcomes.
Citation: Ballotin VR, Bigarella LG, Soldera J, Soldera J. Deep learning applied to the imaging diagnosis of hepatocellular carcinoma. Artif Intell Gastrointest Endosc 2021; 2(4): 127-135
Each year, hepatocellular carcinoma (HCC) is diagnosed in more than half a million people worldwide, and it is the fifth most common cancer in men and the seventh most common cancer in women. The greatest burden of this disease is in developing countries, such as Southeast Asia and Sub-Saharan Africa, where hepatitis B is endemic[2,3].
The incidence of HCC has been rising, unlike many other types of neoplasms. This is expected to change, as the worldwide incidence of viral hepatitis B and C is expected to subdue in the next generation via vaccination and treatment, respectively. Nevertheless, the acute rise in the prevalence of nonalcoholic steatohepatitis in the last couple of decades might become a key risk factor for HCC and could become solely responsible for sustaining its incidence, both in the Western and Eastern population[5,6].
Therefore, understanding the diagnostic and therapeutic approaches to this disease is essential, especially if we keep in mind the quintessential basics of prevention and early detection to improve results[7,8].
DIAGNOSIS OF HCC
HCC diagnosis is currently made using imaging techniques, such as computed tomography and magnetic resonance imaging (MRI). For most cirrhotic patients, these methods are enough for diagnosis, foregoing the necessity of a liver biopsy[9-11]. Nevertheless, the precise diagnosis of a liver nodule via imaging techniques is a rather challenging task, requiring a highly trained and specialized multidisciplinary team of radiologists, hepatologists and oncologists.
In order to facilitate communication between professionals of such a team, a system for reporting imaging of liver nodules has been developed and adopted worldwide–the Liver Imaging Reporting And Data System (LI-RADS). The LI-RADS classification can be found in Table 1. Although this was an attempt into standardization, a high discordance rate among radiologists has been described. Inter-rater reliability has varied greatly in studies, with Cohen’s kappa coefficients ranging from 0.35 to 0.73[15-19]. This is expected, since this classification requires high-quality imaging and radiologists with vast experience[19,20]. Another very important argument is that where HCC incidence is higher (developing countries), highly specialized radiologists are scarcest despite a high volume of patients. In order to improve outcomes and bypass these obstacles, many companies and clinical centers have been trying to develop deep learning systems (DLS) intended to accurately diagnose liver nodules in the cirrhotic liver.
Table 1 Liver imaging reporting and data system classification.
Intermediate probability of HCC
High probability of HCC, not 100%
Definite venous invasion regardless of other imaging features
LR-5 lesion status post-locoregional treatment
Non-HCC malignancies that may occur in cirrhosis: metastases, lymphoma, cholangiocarcinoma, PTLD
There are many DLS approaches available in the literature, where neural networks are gaining much attention currently as one of the best approaches to accurately diagnose liver nodules. Particularly, a DLS based on convolutional neural networks (CNN) could achieve such capacities after machine learning (ML) by using examples of images with and without the disease in question. Unlike other DLS, CNN does not demand a clear definition of the lesion in order to interpret the images, which might lead to discovery of additional differential characteristics that are not currently known by radiologists. Table 2 summarizes the main characteristics about the studies in diagnosis of liver tumors with images and clinical data using DLS.
Table 2 Main characteristics of the studies that evaluate deep learning for liver tumor diagnosis throughout images or clinical data.
3D CNN architecture can bring significant benefit in DW-MRI liver discrimination and potentially in numerous other tissue classification problems based on tomographic data, especially in size-limited, disease specific clinical datasets
This interpretable deep learning system demonstrates proof of principle for illuminating portions of a pre-trained deep neural network’s decision-making, by analyzing inner layers and automatically describing features contributing to predictions
Fully connected neural network with 4 layers of neurons using only biomarkers, gradient boosting (non-linear model) and others
DLS: 83.54%. Gradient boosting: 87.34%
Gradient boosting: 93.27%
Gradient boosting: 75.93%
DLS: 0.884. Gradient boosting: 0.940
Deep learning was not the optimal classifier in the current study
The gradient boosting model reduced the misclassification rate by about half compared with a single tumor marker. The model can be applied to various kinds of data and thus could potentially become a translational mechanism between academic research and clinical practice
MLP, SVM, RF, and J48 using ten-fold cross-validation
MLP: 0.983. SVM: 0.966. RF: 0.964. J48: 0.959
MLP model present with better results
Our proposed system has the capability to verify the results on different MRI and CT scan databases, which could help radiologists to diagnose liver tumors
1Five categories: A: Classic hepatocellular carcinomas; B: Malignant liver tumors other than classic and early hepatocellular carcinomas; C: Indeterminate masses or mass like lesions (including early hepatocellular carcinomas and dysplastic nodules) and rare benign liver masses other than hemangiomas and cysts; D: Hemangiomas; E: Cysts. AUC: Area under the curve; AUROC: Area under the receiver operating characteristic curve; CDNs: Convolutional dense networks CNN: Convolutional neural network; CT: Computed tomography; DLS: Deep learning system; DW-MRI: Diffusion weighted magnetic resonance imaging; GANs: Generative adversarial networks; HCC: Hepatocellular carcinoma; LI-RADS: Liver Imaging Reporting and Data System; LR: LI-RADS; MLP: Multiplayer perceptron; MRI: Magnetic resonance imaging; NA: Not available; RF: Random forest; SVM: Support vector machine.
There are several DLS applied in the recognition of image patterns[25,26], from which CNN-based approaches have achieved the highest performance. While conventional deep learning algorithms require specific features to be extracted from images before the learning process, the application of CNNs requires rather a simpler feature representation based on the original image pixel intensities, also allowing to use all available image information in the learning process. Moreover, CNNs can process extracted image features by several convolution filters, which allow analysis of the image at different granularities. Therefore, CNN is one of the most advanced techniques for artificial intelligence, which has been implemented with success for imaging and clinical interpretation in many medical fields. For example, CNN has been validated to identify liver tumors, the prognosis of esophageal variceal bleeding in cirrhotic patients, to predict the mortality of liver transplantation[30, 31], to predict the prognosis of HCC[32-37] Helicobacter pylori infection, colonic polyps, to help classify mammary cancer, head and neck cancer and gliomas and to focal liver disease detection.
In the topic of liver tumors, many studies have shown that CNNs performed the same or better when compared to experienced radiologists. Hamm et al developed and validated a CNN that classified six types of common hepatic lesions on multi-phasic MRI, achieving better sensitivity and specificity when compared to board-certified radiologists. Nevertheless, this study was developed in only one center, using local and typical images, with no external validation. In a follow-up to this study, Wang et al used a pre-trained CNN in a model-agonistic approach capable of distinguishing among several types of lesions and developed a post-hoc algorithm with the purpose of standardizing the lesion features used in the diagnosis. Such a tool could interact with other standardized scales, such as LI-RADS, validating auxiliary resources and improving clinical practicality. This study found a sensitivity of 82.9% for adequate identification of imaging characteristics when analyzing lesions from a databank. It is expected that this type of DLS that can be transparent regarding its steps towards the diagnosis will have better clinical acceptance.
Yamashita et al developed a DLS applied to diagnose liver carcinoma by using two CNNs: a pre-trained network with an input of triple-phase images (trained with transfer learning from other CNNs) and a custom-made network with an input of quadruple-phase images (trained from scratch from internal data). However, by using external data from other pre-trained CNNs, Zech et al showed that the performance of the DLS worsened when compared to CNNs trained with internal data, showing that it is not still proved that CNNs trained on X-rays from one hospital or one group of hospitals will work equally well at different hospitals. This has also been demonstrated for the detection of pneumonia in chest X-rays, where CNN performed worse when exposed to external data with a wide range of diseases and radiological findings. Besides, such CNNs could be used for the determination of LI-RADS category, which has been shown to be possible, even from a small data set. Nevertheless, external validation seems to be a major obstacle for the dissemination of ML tools. There are many devices that produce images, and there are many ways to store data from these exams.
When compared to other DLS, another advantage of the use of CNNs is that it can improve the diagnosis by using less images for ML, reducing the time of exam and the amount of exposure to radiation[23,43,44]. Moreover, by generating additional training samples through data augmentation, the liver lesion classification sensitivity and accuracy are enhanced whilst less images are required in the ML process. Moreover, the sensitivity, specificity, and accuracy can be manually calculated with the confusion matrix. In Table 3, we compare the best ML algorithms for classification.
Table 3 Best machine learning algorithms for classification.
Naïve Bayes Classifier
Simple, easy and fast. Not sensitive to irrelevant features. Works great in practice. Needs less training data. For both multi-class and binary classification. Works with continuous and discrete data
Accepts every feature as independent. This is not always the truth
Easy to understand. Easy to generate rules. There are almost no hyperparameters to be tuned. Complex decision tree models can be significantly simplified by its visualizations
Might suffer from overfitting. Does not easily work with nonnumerical data. Low prediction accuracy for a dataset in comparison with other algorithms. When there are many class labels, calculations can be complex
Support Vector Machines
Fast algorithm. Effective in high dimensional spaces. Great accuracy. Power and flexibility from kernels. Works very well with a clear margin of separation. Many applications
Does not perform well with large data sets. Not so simple to program. Does not perform so well when the data comes with more noise i.e. target classes are overlapping
Random Forest Classifier
The overfitting problem does not exist. Can be used for feature engineering i.e. for identifying the most important features among all available features in the training dataset. Runs very well on large databases. Extremely flexible and have very high accuracy. No need for preparation of the input data
Complexity. Requires a lot of computational resources. Time-consuming. Need to choose the number of trees
Simple to understand and easy to implement. Zero to little training time. Works easily with multi-class data sets. Has good predictive power. Does well in practice
Computationally expensive testing phase. Can have skewed class distributions. The accuracy can be decreased when it comes to high-dimension data. Needs to define a value for the parameter k
KNN: K-nearest neighbors.
A DLS has been proposed for the prediction of HCC recurrence, using data from computed tomography combined with clinical information. The triple layer model including imaging studies, clinical data and a filtering of this data has had the better performance, with an area under the receiver operating characteristic curve (AUROC) of 0.825. This is way more precise than the current tools are. Furthermore, Sato et al proposed a ML model for predicting HCC using data obtained during clinical practice. The AUROC of the optimal hyperparameter, gradient boosting model, involving multiple laboratories and tumor markets was 0.940. However, when compared with single tumor markers the AUROC to the prediction of HCC for alpha-fetoprotein, des-gamma-carboxy prothrombin and alpha-fetoprotein-L3 were 0.766, 0.644 and 0.683, respectively. Accordingly, a combination of multiple data can provide a reliable diagnostic tool.
A preliminary study has attempted to diagnose liver masses using a CNN without the aid of a radiologist, achieving a high accuracy to differentiate HCC from benign liver masses, achieving an AUROC of 0.92. In another study, a CNN was designed to differentiate HCC from metastatic liver masses on MRI, but this time the DLS used a 3-D representation, with higher accuracy (83.0% of the 3-D model vs 65.2% of the 2-D model). Nevertheless, the authors stressed that more studies with larger databanks are needed to verify the accuracy of this method. Besides that, Naeem et al performed a hybrid-feature analysis between computed tomography scans and MRI for differentiation of liver tumors using DLS. The accuracy of multilayer perceptron model for hepatoblastoma, cyst, hemangioma, hepatocellular adenoma, HCC and metastasis were 99.67%, 99.33%, 98.33%, 99.67%, 97.33% and 99.67% respectively. This method can be helpful to reduce human error.
Therefore, despite the advances in DLS for the diagnosis of imaging techniques, there are many points that need better development in order to become useful and common tools in daily practice. These techniques currently require comparison with trained radiologists and the application for many databanks with atypical images to achieve better results and the use of less radiation for HCC diagnosis.
We previously presented several DLS applied to liver nodule diagnosis; however, they are not able to segment the nodule from the liver in the analyzed images. Moreover, automatic nodule segmentation in an image is a challenging task since this kind of lesion may show a high variability in shape, appearance and localization and is dependent on the equipment, contrast, lesion type, lesion stage and so on.
There are some liver nodule segmentation methods available in the literature, and in one of them a fully convolutional network architecture was adopted to determine an approximation for where the nodule was located on the image. This CNN works on four resolution levels, learning local and global image features. The final nodule segmentation is obtained by using post-processing techniques and a random forest classifier, achieving a quality comparable to a human expert.
However, this method uses hand-crafted features that need the supervision of an expert. There are also automatic approaches that can segment the nodule, where a CNN is used for ML. To refine the segmentation results, this method applies conditional random fields to eliminate the false segmentation points in the segmentation results, improving accuracy. However, liver nodule segmentation in general still needs improvements to achieve a better accuracy and practical applicability. Furthermore, it is necessary for more research effort in DLS to at the same time detect the tumor in the liver and segment it on the image.
In conclusion, the goal of statistical methods is to achieve conclusions about a population from data that are collected from a representative sample of that population, such as linear and logistic regression. Therefore, the objective is to comprehend the associations among variables. However, as reported by Sidey-Gibbons and Sidey-Gibbons, the primary concern about DLS is an accurate prediction. Moreover, explaining the relationship between predictors and outcomes when the relationship is non-linear is difficult. However, in several DLS as improving navigation, translating documents or recognizing objects in videos, understanding the relationship between features and outcomes is less important. In summary, enhancement of DLS features will allow more accurate diagnosis in the medical field. For future research, we recommend to test deep learning methods in other datasets (e.g., other hospitals), develop an easy usable interface and introduce the tool in daily medical practice.
Manuscript source: Invited manuscript
Corresponding Author’s Membership in Professional Societies: Federação Brasileira De Gastroenterologia.
Chagas AL, Mattos AA, Carrilho FJ, Bittencourt PL; Members of the Panel of the 2nd Consensus of the Brazilian Society of Hepatology on the Diagnosis and Management of Hepatocellular Carcinoma; Vezozzo DCP, Horvat N, Rocha MS, Alves VAF, Coral GP, Alvares-DA-Silva MR, Barros FMDR, Menezes MR, Monsignore LM, Coelho FF, Silva RFD, Silva RCMA, Boin IFSF, D Albuquerque LAC, Garcia JHP, Felga GEG, Moreira AM, Braghiroli MIFM, Hoff PMG, Mello VB, Dottori MF, Branco TP, Schiavon LL, Costa TFA. Brazilian society of hepatology updated recommendations for diagnosis and treatment of hepatocellular carcinoma.Arq Gastroenterol. 2020;57:1-20.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 5][Cited by in F6Publishing: 4][Article Influence: 5.0][Reference Citation Analysis (0)]
Elsayes KM, Kielar AZ, Chernyak V, Morshid A, Furlan A, Masch WR, Marks RM, Kamaya A, Do RKG, Kono Y, Fowler KJ, Tang A, Bashir MR, Hecht EM, Jambhekar K, Lyshchik A, Rodgers SK, Heiken JP, Kohli M, Fetzer DT, Wilson SR, Kassam Z, Mendiratta-Lala M, Singal AG, Lim CS, Cruite I, Lee J, Ash R, Mitchell DG, McInnes MDF, Sirlin CB. LI-RADS: a conceptual and historical review from its beginning to its recent integration into AASLD clinical practice guidance.J Hepatocell Carcinoma. 2019;6:49-69.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 42][Cited by in F6Publishing: 11][Article Influence: 21.0][Reference Citation Analysis (0)]
Fowler KJ, Tang A, Santillan C, Bhargavan-Chatfield M, Heiken J, Jha RC, Weinreb J, Hussain H, Mitchell DG, Bashir MR, Costa EAC, Cunha GM, Coombs L, Wolfson T, Gamst AC, Brancatelli G, Yeh B, Sirlin CB. Interreader Reliability of LI-RADS Version 2014 Algorithm and Imaging Features for Diagnosis of Hepatocellular Carcinoma: A Large International Multireader Study.Radiology. 2018;286:173-185.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 51][Cited by in F6Publishing: 42][Article Influence: 12.8][Reference Citation Analysis (0)]