Kim JH, Nam SJ, Park SC. Usefulness of artificial intelligence in gastric neoplasms. World J Gastroenterol 2021; 27(24): 3543-3555 [PMID: 34239268 DOI: 10.3748/wjg.v27.i24.3543]
Corresponding Author of This Article
Sung Chul Park, MD, PhD, Associate Professor, Doctor, Division of Gastroenterology and Hepatology, Department of Internal Medicine, Kangwon National University School of Medicine, Baengnyeong-ro 156, Gangwon-do, Chuncheon 24289, Kangwon Do, South Korea. firstname.lastname@example.org
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Ji Hyun Kim, Seung-Joo Nam, Sung Chul Park, Division of Gastroenterology and Hepatology, Department of Internal Medicine, Kangwon National University School of Medicine, Chuncheon 24289, Kangwon Do, South Korea
Author contributions: Kim JH and Park SC wrote the manuscript and made the tables and figures; Nam SJ assisted in drafting and revising the paper.
Conflict-of-interest statement: The authors declare no conflicts of interest.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Sung Chul Park, MD, PhD, Associate Professor, Doctor, Division of Gastroenterology and Hepatology, Department of Internal Medicine, Kangwon National University School of Medicine, Baengnyeong-ro 156, Gangwon-do, Chuncheon 24289, Kangwon Do, South Korea. email@example.com
Received: January 25, 2021 Peer-review started: January 25, 2021 First decision: March 29, 2021 Revised: April 9, 2021 Accepted: May 21, 2021 Article in press: May 21, 2021 Published online: June 28, 2021
Recently, studies in many medical fields have reported that image analysis based on artificial intelligence (AI) can be used to analyze structures or features that are difficult to identify with human eyes. To diagnose early gastric cancer, related efforts such as narrow-band imaging technology are on-going. However, diagnosis is often difficult. Therefore, a diagnostic method based on AI for endoscopic imaging was developed and its effectiveness was confirmed in many studies. The gastric cancer diagnostic program based on AI showed relatively high diagnostic accuracy and could differentially diagnose non-neoplastic lesions including benign gastric ulcers and dysplasia. An AI system has also been developed that helps to predict the invasion depth of gastric cancer through endoscopic images and observe the stomach during endoscopy without blind spots. Therefore, if AI is used in the field of endoscopy, it is expected to aid in the diagnosis of gastric neoplasms and determine the application of endoscopic therapy by predicting the invasion depth.
Core Tip: Recently, image analysis based on artificial intelligence (AI) has been applied in the field of diagnostic endoscopy in gastroenterology, and active research is also being conducted on gastric neoplasms. Several studies reported that AI-based early gastric cancer diagnosis and the prediction of invasion depth showed excellent performance and that the differential diagnosis from non-neoplastic lesions including benign gastric ulcers was possible. Therefore, if AI is used in clinical practice, it can be expected to help diagnose gastric neoplasms and determine treatment methods.
Citation: Kim JH, Nam SJ, Park SC. Usefulness of artificial intelligence in gastric neoplasms. World J Gastroenterol 2021; 27(24): 3543-3555
Gastric cancer is the fifth most common malignant neoplasm in the world and the third most common cause of cancer-related death[1,2]. Although advanced gastric cancer (AGC) is associated with poor outcomes, the detection of early gastric cancer (EGC) can improve survival up to 90%[1,3]. Endoscopy is the most important tool for detecting and diagnosing gastric cancer. However, the accuracy of detection relies upon the expertise and experience of the endoscopist and complex factors of the gastrointestinal (GI) tract. Accordingly, endoscopy techniques and related fields such as image-enhanced endoscopy have been developed to improve the diagnosis of EGC. Since its introduction in the 1950s, artificial intelligence (AI) such as deep learning (DL) has experienced remarkable progress in the last decade, and many researchers have studied the application of AI not only in the field of medical imaging but also in predicting patient prognosis based on medical records[4,5]. Many studies have utilized AI in endoscopic diagnosis. The application of AI in colonoscopy has significantly improved the adenoma detection rate (29.1% vs 20.3%, P < 0.001), and can even differentiate whether a detected polyp is non-neoplastic or neoplastic[6,7]. Based on such advancements, companies have already adapted AI for use in colonoscopy. Medtronic developed the GI Genius™ Intelligent Endoscopy Module that utilizes AI for the detection of colon polyps in real-time colonoscopy, while Olympus developed the EndoBRAIN-EYE. In addition, Pentax and Fuji released the PENTAX medical Discovery™ and computer-assisted diagnosis (CAD)-EYE, respectively. Many studies have also been conducted in the field of AI in esophagogastroduodenoscopy (EGD). Thus, this article aimed to review recent developments and the use of AI in gastric neoplasms focusing on EGC, which has its unique characteristics among various GI diseases.
AI refers to machines that can do complex tasks like humans by imitating the cognitive functions of human intelligence such as learning and problem-solving (Figure 1). It was first introduced in 1955 and has been rapidly integrated into modern technologies and medicine. Five subfields are included in AI, machine learning (ML), artificial neural network (ANN), natural language processing, DL, and computer vision. ML is a field of AI where large amounts of data and algorithms are incorporated into the machine, and the machine automatically learns the input data by analyzing its patterns. Although the machine is capable of learning data patterns, the process still requires a certain amount of human instruction. DL is an important technique among many methods of ML, which is a process where the machine collects, analyzes, and processes data without receiving human instructions. Using massive amounts of data, the machine creates a learning model by extracting the key features of the given data. ANNs are the core technology of DL, and just as the human brain structure is formed by groups of neurons, the learning model of ML connects several computational nodes into several layers composed of an input layer, an output layer, and one or more hidden layers between them (Figure 2). The simplest type of neural network is called a perceptron, which consists of one input layer and one output node. The weight is a concept that gives a certain amount of importance to each input. The perceptron creates an output using inputs and weights. When an input is received, a weighted sum is calculated according to the weight, and when the value satisfies a specific criterion (activation function), the result is returned as 1 or 0. Convolutional neural network (CNN) is a kind of ANN, an algorithm that automatically learns features from the data, used mainly for image recognition. It is an advanced ML model designed to think similarly to the human brain using large image datasets to learn patterns in correlating images. CNN is typically composed of three types of layers that extract features of the image and those that classify the data. The convolution and pooling layers extract features of the image, while the fully connected layer is responsible for mapping them into output. The convolutional layer is a key in CNN, typically composed of a filter and an activation function. Using the image as input data, the filter extracts features of the image, and the activation function converts the value to a non-linear value. The CNN has multiple network layers of consecutive convolutional layers after pooling layers, and many filters are used as the input image is processed into consecutive convolutional layers. The extracted features are accumulated and become more complex to determine the characteristics of the input image. Subsequently, classification is performed through the fully connected layers, which are the last layers of CNN (Figure 3). As terms appearing in CNNs, one epoch refers to one forward and backward passes of the entire dataset to update the weight. The batch size is the number of training examples processed at one forward and backward pass, and iteration refers to the number of batches to complete one epoch.
Figure 1 Overview of artificial intelligence, machine learning, and deep learning.
Artificial intelligence refers to machines that can do complex tasks like humans by imitating human intelligence. One of the most important ways to achieve artificial intelligence is machine learning. Machine can learn by itself from the data provided to make accurate decisions. Deep learning is an important technique among many methods of machine learning. It is a kind of artificial neural network and learns data through an information input/output layer similar to neurons in the brain.
Figure 2 Illustrative model of artificial neural network.
Once endoscopic image is selected as input layer, hidden layers are connected to next layer. Through this network, the input image is classified into output layer.
Figure 3 Overview of convolutional neural network.
It is composed of stacks of convolutional layers, pooling layers, and fully connected layers. Convolutional and pooling layer extract features of input images, while fully connected layers make output based on classification.
Most examinations and diagnosis of GI tract diseases are performed through endoscopy and endoscopic imaging is one of the most effective applications of AI-based analytics in the field of medicine. The use of CNN is ideal for endoscopic image recognition to detect and localize GI neoplasms. An AI algorithm learns what a neoplasm looks like in an endoscopic image using an image labeled by an endoscopist. After training, the CNN is tested on non-labeled new images to which it has not been previously exposed to and it is validated that the model can correctly identify previously unseen neoplasms. As a result, the algorithm can identify what it believes is a neoplasm in a real-time endoscopic video feed.
AI IN GASTRIC NEOPLASMS
Detection of gastric neoplasms
The detection of early-stage stomach cancer and precancerous lesions is essential to improving survival. Endoscopy is the most important and widely used detection tool for gastric cancer screening but since it is a manual procedure performed by an endoscopist, it is prone to technical and cognitive errors depending upon the endoscopist. EGC lesions usually show subtle changes of mucosa, such as elevation, depression, and redness. Moreover, they are surrounded by chronic inflammation or intestinal metaplasia. Therefore, there is a possibility of missing the subtle changes seen in the early forms of gastric cancer, especially in countries where the incidence of gastric cancer is low and where training is limited. Previous studies reported false-negative rates for detecting gastric cancer ranging between 4.6% and 25.8%[13-17]. A method to improve diagnostic accuracy involves the use of image-enhanced endoscopy such as narrow-band imaging (NBI) and blue laser imaging, which are more effective than conventional white light imaging alone[18,19]. However, such an optical diagnosis requires substantial expertise and experience, hindering its general use in gastroscopy. The 5-year survival rate of gastric cancer patients is highly correlated with the stage of gastric cancer at the time of diagnosis. Thus, it is paramount to improve the detection rates of EGC. Many groups have already started integrating AI into their routine practice to improve the overall detection rates of gastric cancer. AI-assisted evaluation can provide a better objective approach to improving diagnostic accuracy and avoiding unnecessary biopsies. A list of studies using AI in gastric neoplasms is summarized in Table 1.
Table 1 Recently published articles on application of artificial intelligence in gastric neoplasms.
AI: Artificial intelligence; EGC: Early gastric cancer; CNN: Convolutional neural network; SSD: Single Shot MultiBox Detector; PPV: Positive predict value; SVM: Support vector machine; M-NBI: Magnified narrow-band imaging; AUC: Area under curve; GRAIDS: Gastrointestinal Artificial Intelligence Diagnostic System; WLI: White light imaging; NPV: Negative predict value; DCNN: Deep convolutional neural network; VGG: Visual Geometry Group; AGC: Advanced gastric cancer; Grad-CAM: Gradient-weighted class activation mapping; U-TOE: ultrathin transoral endoscopy; C-EGD: conventional esophagogastroduodenoscopy.
To evaluate the diagnostic accuracy of AI in the detection of gastric cancer, Hirasawa et al used a CNN-based algorithm called the Single Shot MultiBox Detector to train using 13584 endoscopic images of gastric cancer, then tested using 2296 images (714 with confirmed gastric cancer) from 69 patients. The overall sensitivity was 92.2% in the detection of gastric cancer, and the process took 47 s to analyze 2296 test images. The CNN accurately detected all invasive gastric cancer images. The detection rate for lesions larger than 6 mm was 98.6% while invasive cancers were all identified by AI. However, in the case of minute cancers that are less than 5 mm, 1 out of 6 (16.7 %) was detected, and 161 non-neoplastic lesions were included in the total 232 lesions that were machine-identified as gastric cancer, which produced a lower positive predictive value (PPV) of 30.6%. The most common cause of false-positive lesions was gastritis with a change in color tone or irregular mucosal surface, which are sometimes difficult to distinguish even by endoscopists, and the next most common cause was normal structures such as cardia, pylorus, and angle. Ishioka et al applied the same algorithm to video images collected from 62 patients who underwent endoscopic submucosal dissection (ESD) for EGC. When applied to the live video images, the diagnostic accuracy was 94.1%, and the median time for lesion detection was one second. Although the accuracy was low in minute cancers, AI showed great performance in lesions larger than 6 mm which looked very promising. In another study, Sakai et al trained a CNN-based system with 348943 images (with data augmentation) obtained from 58 patients and tested 9650 images. The accuracy of detecting gastric cancer by AI was 82.8%, and the image processing time was 4 ms per image.
Gastric cancer has many visual features that are challenging for endoscopists to describe. To improve diagnostic accuracy during endoscopy, several techniques have been developed to assist the gastroenterologist. Magnified NBI (M-NBI) has been shown to have higher detection rates for EGC, however, many endoscopists are not trained to confidently use M-NBI. To facilitate detection using M-NBI, Kanesaka et al developed a CAD system to help diagnose EGC using only M-NBI images. They used support vector machine to train with 66 EGC images and 60 non-cancer images, then tested detection and delineation of gastric cancer with 61 EGC and 20 non-cancer images. They reported an accuracy of 96.3%, a PPV of 98.3%, a sensitivity of 96.7%, and a specificity of 95%. Their CAD processed each image in 0.41 s. In a related study, Li et al used 386 non-cancerous M-NBI images and 1702 M-NBI images of EGC to train the Inception-v3 CNN model and tested 341 endoscopic images. The sensitivity, specificity, and accuracy for the detection of EGC were 91.2%, 90.6%, and 90.9% respectively. In another study by Horiuchi et al, the 22-layer GoogLeNet CNN model was trained using 1492 M-NBI images of EGC and 1078 M-NBI images of gastritis, then tested on 258 images (151 images of EGC). Further, the authors tried to determine if the differentiation between gastritis and cancer was possible. The reported accuracy for the detection of cancer was 85.3%. The sensitivity was 95.4%, the specificity was 71.0%, the PPV was 82.3%, and the negative predictive value (NPV) was 91.7%. The CNN falsely diagnosed 31 gastritis images as cancers, which were reported to have localized atrophy, atrophy of the fundic gland, and intestinal metaplasia. The diagnostic performance of the same model was evaluated using 174 endoscopic videos (87 cancers and 87 non-cancers). The area under the curve (AUC) was 0.8684 and the accuracy, sensitivity, specificity, PPV, and NPV were 85.1%, 87.4%, 82.8%, 83.5%, and 86.7%, respectively. When compared to 11 experts, CAD was significantly more accurate than two experts, and not significantly different from eight experts.
As other studies were single-center results, or limited in the number of included cases, Luo et al conducted a multi-center, case-controlled study of real-world endoscopic imaging to evaluate the accurate diagnosis of upper GI cancer with a CNN. Using 157207 images obtained from 18765 participants from one university cancer center, the authors developed and validated the Gastrointestinal AI Diagnostic System (GRAIDS) algorithm through training, intrinsic verification, and internal validation. Then, they tested the performance of GRAIDS using a prospective validation dataset and additional external validation datasets obtained from five other hospitals, which included 879289 images from 65659 participants. The AUC in the external validation of the five participating hospitals ranged from 0.966 (95%CI: 0.965-0.967) to 0.990 (95%CI: 0.990-0.991). When compared to the diagnostic accuracy of the endoscopists, the diagnostic accuracy of the GRAIDS was 0.928 (95%CI: 0.919-0.937), which was significantly lower than the diagnostic accuracy of 0.967 (95%CI: 0.961-0.973; P < 0.0001) of the expert endoscopist (professor with more than 10 years of endoscopic experience) and 0.956 (95%CI: 0.949-0.963; P < 0.0001) of the competent endoscopist (attending doctor with more than five years of endoscopic experience), but significantly higher than the diagnostic accuracy of 0.886 (95%CI: 0.875-0.897; P < 0.0001) of the trainee endoscopist (resident with two years of endoscopic experience). The sensitivity of the GRAIDS was not significantly different from the expert [0.942 (95%CI: 0.924-0.957) vs 0.945 (95%CI: 0.927-0.959); P = 0.692]. When compared to the competent expert [0.858 (95%CI: 0.832-0.880), P < 0.0001] and the trainee endoscopist [0.722 (95%CI: 0.691-0.752), P < 0.0001], the sensitivity of the GRAIDS was confirmed to be superior. The PPV of the GRAIDS was 0.814 (95%CI: 0.788-0.838), the expert endoscopist was 0.932 (95%CI: 0.913-0.948), the competent endoscopist was 0.974 (95%CI: 0.960-0.984), and the trainee endoscopist was 0.824 (95%CI: 0.795-0.850). The PPV of the GRAIDS was lower than that of the expert and the competent endoscopist but was similar to that of the trainee. These problems are mainly because the GRAIDS misinterprets normal structures (the pylorus, angle, mucus, gastric wall elevation during peristalsis, etc.) as lesions, and validation was conducted with data that had a low prevalence (3.8%-9.5%) in upper GI cancer. However, it seems that normal structures can be easily distinguished by the endoscopist and confirmed as false positives. This was a notable study that used more than one million images obtained from more than 80000 patients from different centers in China. A study by Ikenoyama et al also compared the diagnostic accuracy of AI to that of endoscopists. The AI model from the previous study by Hirasawa et al. was tested on images obtained from 75 patients with gastric cancer [66 with mucosal cancer (T1a), and nine with submucosal cancer], and the diagnostic accuracy was compared to that of 67 endoscopists (33 board-certified endoscopists with more than 18 years of experience, and 34 uncertified endoscopists with about eight years of experience). The sensitivity, specificity, PPV, and NPV of the CNN were 58.4%, 87.3%, 26.0%, and 96.5% respectively. Compared to the CNN, the endoscopists showed a sensitivity, specificity, PPV, and NPV of 31.9%, 97.2%, 46.2%, and 94.9%, respectively, which showed that the CNN had significantly higher sensitivity than the endoscopists. Also, the average time it took for the CNN to evaluate an image was 45.5 ± 1.8 s, which was much faster than the 173.0 ± 66.0 min taken by an endoscopist and suggested that AI accurately diagnosed EGC at a much higher speed.
Classification of gastric neoplasms
While many studies have tested the diagnostic accuracy of AI in differentiating cancerous lesions from normal mucosa, several attempts at classifying other non-cancerous lesions have been made (Table 1). Sun et al created a network-based model that could classify ulcers into different types (benign ulcers or malignant ulcers) with a performance comparable to that of endoscopists. The study reported that the DL model was able to identify and classify ulcers with a total accuracy of 86.6%, which was comparable to that of the endoscopist with the highest accuracy (86.3%) and higher than that of the endoscopist with the lowest accuracy (62.5%). Lee et al developed a model that could distinguish gastric ulcers and malignancy. Using the Inception-v3 network, ResNet50 and the Visual Geometry Group (VGG) Net to classify normal vs cancer, normal vs benign ulcers, and cancer vs benign ulcers, 180 normal images, 200 ulcer images, and 337 cancer images were used for training. When tested on 20 normal, 20 ulcers, and 30 cancer images, the best performance was observed in ResNet50, with a diagnostic accuracy of 0.9649 for differentiating between normal vs cancer, 0.9262 for differentiating between normal vs ulcers, and 0.7712 for differentiating between cancer vs ulcers. Based on such findings, AI was proposed as an efficient means for the classification of endoscopic images. Cho et al made a novel attempt at developing a DL model that could automatically classify gastric neoplasms using conventional endoscopic images. Using 5017 images from 1269 participants, three CNN architectures (Inception-v4, ResNet-152, and Inception-ResNet-v2) were used to train and validate the classification of conventional endoscopic images. The images were classified into two categories from two perspectives, which were cancer vs non-cancer, and neoplasm vs non-neoplasm. All images were grouped into five categories, AGC, EGC, high-grade dysplasia (HGD), low-grade dysplasia (LGD), and non-neoplasm. To compare the diagnostic accuracy, six endoscopists with experience with more than 6000 endoscopies also viewed and classified the endoscopic images. The Inception-ResNet-v2 model was reported to have the best performance at classifying the images into the five categories, with an accuracy of 84.6% (95%CI: 83.69-85.5) and a mean classification time of 0.0264 s. The AUC was highest for the detection of AGC (range: 0.802-0.855) and the lowest for HGD (range: 0.491-0.522). In prospective validation, the performance of Inception-ResNet-v2 was not significantly inferior to that of the endoscopist with the worst performance. However, the endoscopist with the highest performance showed significantly better performance with a diagnostic accuracy of 87.6% (95%CI: 84.3-90.9) compared to 76.4% (95%CI: 72.1-80.7) for Inception-ResNet-v2. This suggested that AI could have the potential for classifying endoscopic lesions into several categories. A recent study by Kim et al assessed the ability of the CNN model to classify gastric mesenchymal tumors using endoscopic ultrasonography (EUS) images. Using 905 EUS images from gastric mesenchymal tumors that were histologically confirmed by either resection or EUS-guided fine-needle biopsy, the CNN-CAD system was developed and validation was performed with 212 EUS images. The reported accuracy for detecting gastrointestinal stromal tumors was 79.2% using the CNN-CAD system, with a sensitivity and specificity of 83.0% and 75.5%, respectively. The performance was compared to that of six endoscopists (three experienced endoscopists who performed more than 500 EUS examinations, and three junior endoscopists who performed less than 200 EUS examinations). When compared to the diagnostic accuracy of the endoscopists, the sensitivity of CNN-CAD system was not significantly different from that of any of the endoscopists. The specificity and diagnostic accuracy of CNN-CAD system were significantly higher than that of two experienced endoscopists and one junior endoscopist, which suggested the potential application of AI in the classification of EUS images as well.
Prediction of invasion depth
The prediction of the invasion depth of gastric cancer (T-staging) is very important as it is an essential factor in determining the treatment method and prognosis of EGC. Tumors in the early stages that do not involve lymphovascular invasion and have an invasion depth no deeper than 500 μm of submucosa can be treated by endoscopic resection alone. The gross findings of the tumor seen on endoscopy or EUS are used to determine the invasion depth of EGC. Some studies have reported that conventional endoscopy was comparable to EUS in predicting the invasion depth of EGC[35,36]. The reported overall accuracy of invasion depth using conventional endoscopy ranged between 69% and 79%[35,37]. In a study of depth prediction scores for differentiated EGCs, tumor sizes more than 30 mm, marked redness, an uneven surface, and marginal elevation were associated with deeper submucosal cancers. However, gastric cancer depth can be difficult to determine by endoscopy alone and some patients may undergo surgery when endoscopic resection could have been an effective method of treatment. To overcome such problems, the utilization of AI to determine the depth of invasion has been studied (Table 1). Kubota et al used retrospectively collected 902 conventional endoscopic gastric cancer images from 344 patients who underwent surgery or endoscopic resection to train and validate with a backpropagation algorithm for determining the depth of invasion. The overall accuracy for detecting the depth of invasion was 64.7%, with 77.2% at the T1 stage (68.9% for T1a and 63.6% for T1b), 49.1% at the T2 stage, 51.0% at the T3 stage, and 55.3% at the T4 stage. This computer-aided system suggested a novel approach of using AI to determine cancer invasion depth by endoscopy. Zhu et al used 790 images from gastric cancer patients to train and another 203 images to validate ResNet50. The overall accuracy was reported to be 89.2%, which was significantly higher than the overall accuracy of 77.5% of the experienced endoscopists. The AUC for AI was 0.94 (95%CI: 0.90-0.97), and the sensitivity, specificity, PPV, and NPV were 76.5%, 95.6%, 89.7%, and 89.0%, respectively. To test AI in the diagnostic accuracy for EGC stages, Yoon et al[41,42] included 800 patients, 428 patients with T1a and 372 patients with T1b histology-proven EGC, and selected 11539 images (896 T1a images, 809 T1b-EGC images, and 9834 non-cancer images) to train and validate the lesion-based VGG16-network and gradient-weighted class activation mapping (Grad-CAM)[41,42]. The overall AUC for EGC detection and invasion depth prediction was 0.981 and 0.851, respectively. Interestingly, the study also analyzed the factors affecting the AI prediction of invasion depth. The images of undifferentiated-type histology were associated with inaccurate predictions of invasion depth, especially in T1b cases. As previous studies used already diagnosed gastric cancer images for training and testing, Cho et al used Inception-ResNet-v2 and DenseNet-161 models to test the diagnostic accuracy of gastric neoplasms and invasion depth. The authors used 2899 conventional endoscopic images obtained from 846 patients with confirmed pathology including LGD, HGD, EGC, and AGC. The AUC and diagnostic accuracy for determining the invasion depth were 0.887 and 77.3%, respectively, in the external validation set for the DenseNet-161 model. When applied to clinical simulation, the AI misdiagnosed only two cases that had submucosa invasion (misdiagnosed as mucosal lesions), which were also misdiagnosed by the endoscopists. In 89 patients who underwent surgery, 11 cases were actually mucosal-confined lesions, among which AI correctly classified six cases as mucosal lesions. The authors developed an algorithm with substantial performance in predicting invasion depth from the endoscopic images of neoplasms. As other studies used images obtained from conventional endoscopy, Nagao et al retrospectively collected 16557 gastric cancer images from 1084 cases to train and validate ResNet50 for predicting invasion depth by conventional white light, non-magnifying NBI, and indigo-carmine stained images. The AUC using white light imaging, NBI, and indigo-carmine stain imaging were reported to be 0.9590, 0.9048, and 0.9481 respectively, and the lesion-based accuracy for predicting invasion depth using white light imaging, NBI, and indigo-carmine were 94.5%, 94.3%, and 95.5%, respectively.
Observing the whole stomach is a basic prerequisite for the diagnosis of gastric cancer at an early stage. To avoid blind spots, standardized procedures and guidelines have been made to map the entire stomach during gastroscopy. The European Society of Gastrointestinal Endoscopy published a protocol including 10 images of the stomach while the systematic screening protocol for the stomach published by Japanese researchers suggested 22 standard images of the stomach to avoid missing suspicious cancerous lesions[45,46]. However, insufficient supervision and the lack of practical tools make it difficult to follow protocols, which is related to the quality of endoscopic examinations. To localize blind spots during EGD that may have been missed by an endoscopist, Wu et al developed the WISENSE system, a real-time CNN to detect blind spots (Table 1). As the scope was inserted into the stomach, the deep CNN (DCNN) captured images and filled them into the corresponding part of the model, which enabled the endoscopist to identify the blind spots. These blind spots of the gastric mucosa, such as the lesser curvature of the antrum and the fundus, are areas that may hide lesions. If blind spots are not viewed during endoscopy, lesions could be missed. Trained on 34513 images of gastric locations agreed upon by at least four endoscopists, WISENSE was able to detect blind spots with an accuracy of 90.0% by identifying anatomic landmarks in EGD. In a single-center randomized control trial, 153 patients had their blind spots detected by WISENSE vs 150 in the control group without AI. The blind spot rate, defined as the proportion of the number of unobserved sites in 26 sites, was 5.9% in the WISENSE group which was significantly less than 22.5% in the control group (P < 0.001), suggesting that AI can also be used to improve the quality of EGD by identifying blind spots. In another study by Wu et al, a DCNN was used for detecting gastric cancer and identifying blind spots. There were 24549 images used for training, and a grid model for the stomach was developed to generate a virtual stomach model. The study reported a diagnostic accuracy of 92.5% for detecting malignancy, which was significantly higher than that of six experts. The reported sensitivity, specificity, PPV, and NPV were 94.0%, 91.0%, 91.3%, and 93.8%, respectively. The DCNN correctly identified the EGD images into 10 parts with an accuracy of 90.0% and into 26 parts with an accuracy of 65.9%, which was not significantly different from those of the endoscopists. When the model was tested on endoscopic videos, the DCNN accurately presented the covered parts synchronized with the process of EGD to verify that the entire stomach was mapped. A related study by Chen et al used ENDOANGEL (developed from WISENSE) to compare blind-spot monitoring in sedated conventional EGD (C-EGD), unsedated ultrathin transoral endoscopy (U-TOE), and unsedated C-EGD. This prospective, 3-parallel-group, randomized study reported that the blind-spot rate with AI was significantly lower in sedated C-EGD compared with unsedated U-TOE and unsedated C-EGD (sedated C-EGD vs unsedated U-TOE vs unsedated CEGD: 3.4% vs 21.8% vs 31.2%, P < 0.05). Although the number of studies is limited, the application of AI in monitoring blind spots is very promising.
Prediction of curative endoscopic resection
Expanded indications for ESD in EGC include the undifferentiated type that is less than 2 cm and does not have ulcerations. However, observational studies have reported conflicting results. Thus, ESD in such groups has been considered an investigational treatment. A meta-analysis of curative resection for EGC with undifferentiated type histology reported a rate of 61.4%, suggesting ESD as a feasible treatment for undifferentiated-type EGC. To aid in the accurate prediction of curative resection in such cases, Bang et al selected ML models that could predict curative resection in undifferentiated-type EGC. The XGBoost classifier presented the best performance with an accuracy of 81.5% in the first external validation and 89.8% in the second external validation. The size of the lesion was the most important feature that could be explained by AI analysis. As such, AI could aid in decisions for therapeutic management.
FUTURE PERSPECTIVES OF AI
The real-time application of AI in the field of medicine is within reach. Endoscopic models that automatically detect colon polyps or gastric cancers during endoscopy sessions and highlight them using segmentation box have already received approval for use in Europe, Japan, and other countries, while many systems are currently under development. Many software codes have been provided as open-source codes, which can be freely utilized in research or actual practice. Architectures can be modified by fine-tuning an already established pre-trained model by adjusting layers of the ANN, increasing the learning epoch, adjusting the batch size, adjusting the iteration, or modifying hyperparameters such as the optimizer. Aside from adjusting the complex algorithms to optimize the model, recent developments have enabled the automatic optimization of hyperparameters in ML (i.e., AutoGluon) that makes AI more user-friendly and easier to use for clinicians unfamiliar with AI. Most research on AI for gastroenterology has focused on developing algorithms for the detection of lesions, the classification of images to improve diagnostic accuracy, predicting prognosis, and to improve the quality of screening endoscopy. In the near future, AI will most likely be applied to therapeutic management. Recently, AI-based treatment methods have been developed using technologies such as microendoscopy, decision support system-based treatment modalities, robot-assisted treatment, application, and digital therapeutics. However, such development comes with social issues other than technology, such as patient safety, ethics, legal responsibility, government approval, and cost-effectiveness, which need to be addressed as well. Although studies have shown that accuracy of detecting gastric cancer by AI is comparable to some doctors, experienced doctors with expertise have shown better performance than AI. This means that there is limitation to relying solely on AI alone. However, beneficial factors from application of AI, such as improved efficacy and time spent on repetitive task, must be acknowledged as well. Accordingly, the most applicable field of AI would be medical image data processing that could aid in improved diagnostic performance of trainees and non-expert doctors. The AI algorithm, especially DL, is comparable to a black box that learns from training data. Using the patterns learned from the training data, the output values can be predicted from newly input data. This means that efficacy and accuracy are highly dependent upon the quality and quantity of the training data. Like any other clinical research, the quality and quantity of the usable data are undeniably essential in proving the quality of the evidence and the outcome. It is important to gather high-quality clinical data, while developing a model that accurately tests the data is equally important. To effectively utilize such an AI algorithm in clinical practice, further studies and discussions on the usefulness, profitability, possible risks, medicolegal responsibility, and regulatory measures of AI are needed.
AI in the field of endoscopy was first applied for the detection of colon polyps. As described in this review article, many studies have already been published as stepping-stones toward the application of AI in detecting gastric neoplasms such as EGC. As there is a lack of such prospective studies in the detection of EGC, randomized controlled studies are needed to advance the technique. It is expected that the application of AI would not only provide guidelines for the endoscopic treatment of EGC or avoid unnecessary surgery by predicting the invasion depth but also help improve the overall prognosis of patients with EGC. There is no doubt that the development of AI-based endoscopy would also help to alleviate physical fatigue that can be a burden to endoscopists. Such achievements can only be done when the application of AI can improve the quality of imaging diagnosis beyond that of human capability, and optical biopsy is possible. This is possible by improving AI performance using the specific characteristics of different organs and diseases. AI is being studied and developed by scientists all over the world in various fields with hopes of providing accuracy and convenience. In the field of medicine, medical records and imaging are becoming digitalized and a new phase in the history of medicine is expected within five to 10 years. Accordingly, clinicians and researchers need to carefully approach and evaluate the results of further clinical studies using AI-based technology with great interest.
Hosokawa O, Hattori M, Douden K, Hayashi H, Ohta K, Kaizaki Y. Difference in accuracy between gastroscopy and colonoscopy for detection of cancer.Hepatogastroenterology. 2007;54:442-444.
[PubMed] [DOI][Cited in This Article: ]
Horiuchi Y, Hirasawa T, Ishizuka N, Tokai Y, Namikawa K, Yoshimizu S, Ishiyama A, Yoshio T, Tsuchida T, Fujisaki J, Tada T. Performance of a computer-aided diagnosis system in diagnosing early gastric cancer using magnifying endoscopy videos with narrow-band imaging (with videos). Gastrointest Endosc 2020; 92: 856-865.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 40][Cited by in F6Publishing: 39][Article Influence: 13.0][Reference Citation Analysis (0)]
Luo H, Xu G, Li C, He L, Luo L, Wang Z, Jing B, Deng Y, Jin Y, Li Y, Li B, Tan W, He C, Seeruttun SR, Wu Q, Huang J, Huang DW, Chen B, Lin SB, Chen QM, Yuan CM, Chen HX, Pu HY, Zhou F, He Y, Xu RH. Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: a multicentre, case-control, diagnostic study.Lancet Oncol. 2019;20:1645-1654.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 155][Cited by in F6Publishing: 167][Article Influence: 41.8][Reference Citation Analysis (0)]
Sun JY, Sang WL, Kang MC, Kim SW, Ko SJ.
A Novel Gastric Ulcer Differentiation System Using Convolutional Neural Networks. Proceedings of the 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS); 2018 June 1; IEEE, 2018: 351-356.
[PubMed] [DOI][Cited in This Article: ]
Nagao S, Tsuji Y, Sakaguchi Y, Takahashi Y, Minatsuki C, Niimi K, Yamashita H, Yamamichi N, Seto Y, Tada T, Koike K. Highly accurate artificial intelligence systems to predict the invasion depth of gastric cancer: efficacy of conventional white-light imaging, nonmagnifying narrow-band imaging, and indigo-carmine dye contrast imaging. Gastrointest Endosc 2020; 92: 866-873.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 45][Cited by in F6Publishing: 45][Article Influence: 15.0][Reference Citation Analysis (0)]
Bisschops R, Areia M, Coron E, Dobru D, Kaskas B, Kuvaev R, Pech O, Ragunath K, Weusten B, Familiari P, Domagk D, Valori R, Kaminski MF, Spada C, Bretthauer M, Bennett C, Senore C, Dinis-Ribeiro M, Rutter MD. Performance measures for upper gastrointestinal endoscopy: a European Society of Gastrointestinal Endoscopy (ESGE) Quality Improvement Initiative.Endoscopy. 2016;48:843-864.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 152][Cited by in F6Publishing: 157][Article Influence: 22.4][Reference Citation Analysis (0)]
Wu L, Zhang J, Zhou W, An P, Shen L, Liu J, Jiang X, Huang X, Mu G, Wan X, Lv X, Gao J, Cui N, Hu S, Chen Y, Hu X, Li J, Chen D, Gong D, He X, Ding Q, Zhu X, Li S, Wei X, Li X, Wang X, Zhou J, Zhang M, Yu HG. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy.Gut. 2019;68:2161-2169.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 169][Cited by in F6Publishing: 164][Article Influence: 41.0][Reference Citation Analysis (0)]
Chen D, Wu L, Li Y, Zhang J, Liu J, Huang L, Jiang X, Huang X, Mu G, Hu S, Hu X, Gong D, He X, Yu H. Comparing blind spots of unsedated ultrafine, sedated, and unsedated conventional gastroscopy with and without artificial intelligence: a prospective, single-blind, 3-parallel-group, randomized, single-center trial. Gastrointest Endosc 2020; 91: 332-339.
[PubMed] [DOI][Cited in This Article: ][Cited by in Crossref: 47][Cited by in F6Publishing: 46][Article Influence: 15.3][Reference Citation Analysis (0)]