Yoshida H, Kiyuna T. Requirements for implementation of artificial intelligence in the practice of gastrointestinal pathology. World J Gastroenterol 2021; 27(21): 2818-2833 [PMID: 34135556 DOI: 10.3748/wjg.v27.i21.2818]
Corresponding Author of This Article
Hiroshi Yoshida, MD, PhD, Staff Physician, Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan. email@example.com
Checklist of Responsibilities for the Scientific Editor of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Author contributions: Yoshida H and Kiyuna T contributed equally to this work.
Conflict-of-interest statement: All authors have no competing interests to be declared.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Hiroshi Yoshida, MD, PhD, Staff Physician, Department of Diagnostic Pathology, National Cancer Center Hospital, 5-1-1 Tsukiji, Chuo-ku, Tokyo 104-0045, Japan. firstname.lastname@example.org
Received: February 4, 2021 Peer-review started: February 4, 2021 First decision: March 6, 2021 Revised: March 16, 2021 Accepted: April 28, 2021 Article in press: April 28, 2021 Published online: June 7, 2021
Tremendous advances in artificial intelligence (AI) in medical image analysis have been achieved in recent years. The integration of AI is expected to cause a revolution in various areas of medicine, including gastrointestinal (GI) pathology. Currently, deep learning algorithms have shown promising benefits in areas of diagnostic histopathology, such as tumor identification, classification, prognosis prediction, and biomarker/genetic alteration prediction. While AI cannot substitute pathologists, carefully constructed AI applications may increase workforce productivity and diagnostic accuracy in pathology practice. Regardless of these promising advances, unlike the areas of radiology or cardiology imaging, no histopathology-based AI application has been approved by a regulatory authority or for public reimbursement. Thus, implying that there are still some obstacles to be overcome before AI applications can be safely and effectively implemented in real-life pathology practice. The challenges have been identified at different stages of the development process, such as needs identification, data curation, model development, validation, regulation, modification of daily workflow, and cost-effectiveness balance. The aim of this review is to present challenges in the process of AI development, validation, and regulation that should be overcome for its implementation in real-life GI pathology practice.
Core Tip: The advances in artificial intelligence (AI) will revolutionize medical practice, as well as other areas of medicine. Deep learning algorithms have shown promising benefits in various areas of diagnostic histopathology. Despite this, AI technology is not widely used as a medical device and is not approved by a regulatory authority. Thus, implying that certain improvements in the development process are still necessary for the implementation of AI in the real-life histopathology-practice. This paper aims to provide a review of recent AI developments in gastrointestinal pathology and the challenges in their implementation.
Citation: Yoshida H, Kiyuna T. Requirements for implementation of artificial intelligence in the practice of gastrointestinal pathology. World J Gastroenterol 2021; 27(21): 2818-2833
The integration of artificial intelligence (AI) will cause a revolution in various areas of medicine, including gastrointestinal (GI) pathology, in the next decade. Advances in slide scanner technology have made it possible to quickly digitalize histological slides at high resolution, which could be used in clinical practice, research, and education [2-4]. The drastic increase in computing capacity and improvement in information technology (IT) infrastructure has allowed rapid and efficient processing of large-sized data such as whole slide images (WSIs). In recent years, there has been an increase in computer applications utilizing AI to analyze images.
AI is an umbrella terminology for the different strategies a computer can employ to think and learn like a human. Pathological AI models have progressed from expert systems to conventional machine learning (ML) and deep learning (DL). Both expert systems and conventional ML use expert knowledge and expert-defined rules about objects. On the contrary, DL directly extracts features from the raw data and leverages multiple hidden layers of data for the output (Figure 1). Compared to conventional ML, DL is simpler to conduct, performs with high-precision, and is cost-effective[5,8]. Its implementation enhances the reproducibility of the subjective visual assessment by human pathologists and integrates multiple parameters for precision medicine[9,10]. Currently, DL algorithms have shown promising benefits in different facets of diagnostic histopathology, such as tumor identification, classification, prognosis prediction, and biomarker/genetic alteration prediction[5,11]. In addition, various AI applications have been developed for GI pathology[12-14].
Figure 1 General workflow of construction of artificial intelligence model in pathology.
Stained slides are converted to digital input images by a slide scanner. Both (a) hand-crafted feature engineering and (b) deep learning approach generate outputs of classification, which are applied to various clinically relevant predictions.
AI applications using DL algorithms have demonstrated various benefits in the field of GI pathology. Recent reviews (gastric and colorectal) provide an overview of the rapid and extensive progress in the field[5,11-14]. In 2017, the Philips IntelliSite (Philips Electronics, Amsterdam, The Netherlands) whole-slide scanner was approved by the Food and Drug Administration (FDA) in the United States. The implementation of AI in pathology is also promoted by various startups such as DeepLens and PathAI. Some institutions have agreed to digitize their pathology workflow[17,18]. Although these advances are promising, unlike in the field of radiology or cardiology imaging, no histopathology-related AI application has been approved by a regulatory authority or for public reimbursement. This indicates that there are still many obstacles to be resolved before the introduction of AI applications in real-life histopathology practice (Figure 2).
Figure 2 Challenges for implementation in the development process of an artificial intelligence application.
The process of development and implementation of an artificial intelligence (AI) application is composed of multiple steps from needs identification to use in real-life (left). In each step, various challenges keep AI applications from being implemented into clinical practice (right). AI: Artificial intelligence; IT: Information technology.
In this review, we aim to present and summarize challenges in the process of development, validation, and regulation that should be overcome for the implementation of AI in real-life GI pathology practice. The complete and comprehensive review of the literature on GI pathology-related AI applications is beyond the scope of this paper and is well described elsewhere[12-14]. Here, we focused on how we can adopt these recent advancements in our daily practice.
AI-APPLICATIONS IN GI PATHOLOGY
AI applications in tumor pathology, including GI cancers[4,5] have been developed for tumor diagnosis, subtyping, grading, staging, prognosis prediction, and identification of biomarkers and genetic alterations. In the current decade, the implementation of DL technologies has dramatically improved the accuracy of digital image analysis. DL is one of the ML methods that are particularly effective for digital image analysis. DL is based on the use of convolutional neural networks (CNNs), consisting of millions of artificial neurons, assembled in several layers that are capable of translating its input data (pixel value matrix for an image) into a more abstract representation (Figure 1). The various layers of mathematical computation are fed into a dataset of digitized images annotated with a specific label (e.g., carcinoma or benign lesion); ultimately, the CNN learns how to categorize images according to their respective labels. They automatically identify the most distinctive and common characteristics of each type of object. CNNs outperform hand-crafted or conventional ML techniques (using support vector machines or random forests), by a substantial margin, in image classification[8,20]. In GI pathology, the prediction targets also include tumor classification, the clinical outcome of the patient, and genetic alterations within the tumor (Tables 1 and 2).
Table 1 Artificial intelligence applications in gastric cancer pathology.
In addition, a variety of ML methods have been developed. The strengths and weaknesses of typical ML methods are summarized in Table 3. All of the current ML methods have their advantages and disadvantages, and it is necessary to select an appropriate method according to the purpose of image analysis. DL-based methods are most commonly used in current image analysis of GI pathology; however, they have limitations of requiring substantial data sets and insufficient interpretability. In the future, the development of new ML methods that can compensate for the disadvantages of current ML methods will further accelerate the development of AI-models.
Table 3 Advantages and disadvantages of representative machine-learning methods in the development of artificial intelligence-models for gastrointestinal pathology.
Conventional ML (supervised)
User can reflect domain knowledge to features
Requires hand-crafted features; Accuracy depends heavily on the quality of feature extraction
Conventional ML (unsupervised)
Executable without labels
Results are often unstable; Interpretability of the results
Deep neural networks (CNN)
Automatic feature extraction; High accuracy
Requires a large dataset; Low explainability (Black box)
Executable without detailed labels
Requires a large dataset; High computational cost
Semantic segmentation (FCN, U-Net)
Pixel-level detection gives the position, size, and shape of the target
Histopathological AI-applications in gastric cancer
Several attempts have been made to classify pathological images of gastric cancer using AI (Table 1). Before we go into details of AI research review, it should be noted that the comparison of performances should not rely only on accuracy; we should pay attention to the task difficulty in the research framework, i.e., (1) dataset size (results for small sample size are less reliable), (2) resolution of detection (tissue level or region level), (3) number of categories to be classified, (4) multi-site validation (sources of training and test dataset are from the same site or not), and (5) constraints on target lesion (e.g., adenocarcinoma only, or any lesions except lymphoma). Sharma and colleagues documented the detection of gastric cancer in histopathological images using two DL-based methods: one analyzed the morphological features of the whole image, while the other investigated the focal features of the image independently. These models showed an average accuracy of up to 89.7%. Iizuka et al reported an AI algorithm, based on CNNs and recurrent neural networks, to classify gastric biopsy images into gastric adenocarcinoma, adenoma, and non-neoplastic tissue. Within three independent test datasets, the algorithm demonstrated an area under the curve (AUC) of 0.97 for the classification of gastric adenocarcinoma. Yoshida et al, using gastric biopsy specimens, contrasted the classification outcomes of experienced pathologists with those of the NEC Corporation-built ML-based program "e-Pathologist". While the total concordance rate between them was only 55.6 percent (1702/3062), the concordance rate was as high as 90.6 percent (1033/1140) for the biopsy specimens negative for a neoplastic lesion. Tomita et al attempted to automate the identification of pre-neoplastic/neoplastic lesions in Barrett esophagus or gastric adenomas/adenocarcinomas.
The above tumor classification studies have shown that AI can be used for histopathological image analysis. However, other obstacles are hindering its use in real-life practice. For example, although the workload of pathologists can be minimized, by defining cases for no further review by a pathologist, even in "negative" gastric biopsies, other findings, in addition to neoplastic lesions, such as Helicobacter pylori infection, need to be reviewed and recorded. Therefore, AI application cannot be functional until it sufficiently represents diagnostic procedures of real-life practice.
The prediction of prognosis from histopathological images of GI cancers is also an attractive area for AI application. Considering the many types of histopathological prognostic features of cancer, such as tumor differentiation or lymphovascular involvement, the unveiling of hidden morphological features may be expected from AI for better prediction of clinical outcomes from the histopathological images alone[25-27]. After ingesting a sufficient number of histopathological images from patients with known outcomes, AI may comprehensively predict the patient's future outcomes. Recently, an exponentially increasing number of studies conducted for major GI cancers have demonstrated the feasibility of this concept[26,28,29]. Additionally, according to a recent study, tumor-infiltrating lymphocytes were associated with the prognosis of patients with gastric cancer. CNN model may detect tumor-infiltrating lymphocytes on histopathological specimens with an acceptable accuracy of 96.9%. The development of DL models that incorporate clinical and multi-omics data is also a promising approach for predictive purposes. Prognosis prediction by AI applications might be more accurate than that by the conventional pathological method; however, these AI-based predictions alone seem not to be accepted in clinical practice due to lack of interpretability. If doctors and patients cannot understand the reason for prediction, they will not recognize misprediction by AI. We cannot provide patients’ care based on prediction as in “fortune-telling.” Biological and clinical reasons for the prediction by AI application must be understood prior to its implementation into clinical practice.
Some researchers have also attempted to predict biomarker status from histopathological images alone using AI applications. Specimens of various GI cancers can be processed to identify molecular markers that may predict responses to targeted therapies. Research has shown that certain clinically relevant molecular alterations in GI cancers are associated with specific histopathological features detected on hematoxylin-eosin (HE) slides; there have been some successful attempts to adopt AI applications for HE sections as surrogate markers for these alterations[31-34].
Histopathological AI-applications in colorectal cancer
As in gastric cancer, various AI applications have recently been developed for colorectal cancer (Table 2). Regarding tumor classification, several AI algorithms have been trained to classify the dataset into two to six specific classes, such as normal, hyperplasia, adenoma, adenocarcinoma, and histological subtypes of polyps or adenocarcinomas[22,35-40]. Korbar et al reported that the AI model, constructed using over 400 WSIs, could classify five types of colorectal polyps with an accuracy of 93%. Wei et al demonstrated that the DL model, trained using WSIs, could classify colorectal polyps, even in datasets from the other hospitals, with reproducibility. Its accuracy was comparable to that of a local pathologist. While most researches exhibit promising performance, a precise comparison of performances among these AI applications is impossible and irrelevant; each model is derived from different datasets with different annotations and focuses on different tasks. To accurately compare the performance of AI models, it is necessary to have them perform a common task using a standardized dataset with standardized annotations.
Further, a few studies have predicted prognosis using pathological images for colorectal cancer[26,34,42]. Bychkov et al used 420 tissue microarray-WSIs to predict the 5-year disease-specific survival of patients and obtained an AUC of 0.69. Kather et al used more than 1000 histological images, collected from three institutions, to predict the prognosis of the patient; they observed accuracy of 99%. Another study, using the ResNet model for direct identification of microsatellite instability (MSI) on histological images, demonstrated an AUC of 0.77 for both FFPE and frozen specimens from The Cancer Genome Atlas (TCGA). The identification of colorectal cancer with MSI is crucial; these tumors are reportedly highly responsive to immunomodulating therapies[43,44]; moreover, the MSI could be a clue for the diagnosis of Lynch syndrome. MSI is usually identified by polymerase chain reaction (PCR), but not all patients are screened for MSI in clinical practice. Echle et al recently developed a DL model to detect colorectal cancer with MSI using more than 8800 images. The DL algorithm demonstrated an AUC of 0.96 in the multi-institutional validation cohort. Furthermore, the consensus molecular subtype of colorectal cancer could be predicted from the images of colorectal surgical specimens using a CNN-based model. Although prediction of molecular alterations by AI application might seem attractive, as clinically relevant biomarkers cannot be identified using HE stained slides and conventional PCR assay are both expensive and time-consuming, AI can neither achieve complete concordance with the gold standard test nor replace it. Thus, users must consider how to employ AI for predicting biomarkers with an appropriate, cost-effective balance in real-life practice.
A ROAD TO IMPLEMENTATION OF AI APPLICATIONS INTO REAL-LIFE PRACTICE
To achieve clinical implementation of the AI, several steps should be considered (Figure 2). Colling et al presented an expected roadmap for the routine use of AI in pathology practice. They highlighted the main aspects of designing and applying AI in daily practice. The steps concerning design creation, ethics, financing, development, validation and regulation, implementation, and effect on the workforce were closely reviewed. For pathological image analysis, various problems exist in the execution of these steps, which would prevent the AI from being implemented in the clinical practice for GI cancers.
Identification of the true needs in daily practice
AI applications can either conduct routine tasks, usually performed by pathologists, or offer novel insights into diseases that are not possible by human pathologists. The applications are needed to fill gaps and address unmet needs without impacting the daily workflow in the pathology department. The needs include mitosis detection, tumor-percentage calculation, lymph node metastasis, and other activities that are considered monotonous, repetitive, or vulnerable to higher interobserver variability.
The initial step in the development of the AI application is to recognize the true clinical need and define a possible solution. The novel AI applications can be developed by various stakeholders, including pathologists, physicians, computer scientists, engineers, IT companies, and drug companies. However, viewpoints between the professionals in academia and industry differ. For example, individuals in academia and businesses have different goals, such as grant funding, academic publications, and profitable commercial products.
Even if there is a problem that pathologists are eager to solve, the market size of the problem could be small. If the cost of developing an AI application to solve the problem cannot be recovered by the subsequent profit from the sale of the application, the company may not develop it. There is a wide range of classification tasks in diagnostic pathology, and it is difficult to secure an appropriate market for an AI application specializing only in a single task. For example, an AI algorithm can detect lymph node metastases in breast cancer as reliably as human pathologists[48,49]. Still, this tool has not been widely used or approved by the regulatory authorities. Although there could be many reasons, one is the imbalance between the overall cost of its implementation and the benefit of detecting only breast cancer lymph node metastases in real-life pathology practice.
Another significant concern is obtaining consent for the use of patient data in AI-model development. Although the consent for research use could be obtained in most studies, patients might not consent to commercial use of their data required for product development, which could be an obstacle when developing products for clinical implementation. Therefore, consent should be obtained at the beginning of the research, conveying the possibility of its commercial use for product development; a framework for global data sharing should be developed.
For the development of AI algorithms, at least three parties need to collaborate, which include pathologists who know the true needs, academic professionals who can develop technology, and companies that will promote AI applications as products. In addition, to obtain a sufficiently sized market, it may be vital to develop global networks and online services using the cloud.
After a concept of AI has been conceived and collaboratively established, the development of AI is carried out through the following steps: defining the output, designing the algorithm, collection of a pilot or larger follow-up sample, annotation and processing of data, and performing statistical analysis of the data.
High-quality data set curation is one of the major hurdles in the development of AI applications. Generally, CNNs require hundreds or thousands of data sets of pathological images to achieve significant performance and sufficient generalizability. For rare tumors, researchers can obtain a very limited number of images; thus, it requires efficient data augmentation techniques and learning methods to resolve this issue. Conversely, in the case of transfer learning, small-scale datasets consisting of < 100 digital slides may suffice.
In addition, publicly available datasets should be developed for global data sharing. However, few such datasets are available in pathology, partly due to confidentiality, copyright, and financial problems. Even under such circumstances, TCGA provides many WSIs and associated molecular data. However, even TCGA data does not include sufficient numbers of cases for training AI applications for clinical implementation. Another potential source of datasets could be the public challenges provided for developing DL algorithms.
The development of AI applications with sufficient performance needs training on huge datasets demonstrating scanning and staining protocol variability[56,57]. The major challenges for its implementation into practice are laboratory infrastructure and reproducibility and robustness of the AI model. Recently, automated methods for reducing blur in images have been developed. Automated algorithms (for example, HistoQC and DeepFocus can reportedly standardize the quality of WSIs; these AI applications automatically detects optimum quality regions and eliminates out-of-focus or artifact-related regions. Standardization of the color, displayed by histopathological slides, is important for the accuracy of AI; the color variations are often produced due to differences in batches or manufacturers of staining reagents, variations in the thickness of tissue sections, the difference in staining protocols, and disparity in scanning characteristics. These variations lead to inadequate classification by AI applications[56,60]. AI algorithms have been developed to standardize the data, including staining and color characteristics.
After data set curation, the annotation of the dataset is required. Histopathological image annotation is not a simple task. The extent of annotation detail depends on the application of AI, which could vary from classification at the slide level to labeling at the pixel level. The annotation task, for many images, by human experts is time-consuming and tedious. In addition, variability in annotation performance, especially when the task is difficult, may affect the accuracy of the trained models. Moreover, for manufacturers, this task could be often expensive. Among GI pathologies, many lesions, such as intramucosal gastric carcinoma, do not have high interobserver reproducibility. When developing an AI application to assist pathologists in making a diagnosis, if the target disease shows significant interobserver variability, the correctness of the annotation of the dataset cannot be guaranteed, and the trained algorithm may not be able to reproduce performance in the dataset when used in other facilities, which may hinder its clinical implementation.
The problem of annotation in AI is an important research area. The majority of the AI models are trained using images of small tissue patches collected from WSIs. Since the patches, cropped from positive tissue, may not contain a tumor unless the tissue is filled with tumors, it is challenging to construct a high-accuracy model, particularly when pixel-level labeling is unavailable. To conduct patch-based training, without detailed annotation, multi-instance learning (MIL) algorithm can be used[64,65]. Cosatto et al employed MIL for gastric cancer detection; they used over 12000 cases, 2/3rd for training and 1/3rd for the test, and achieved an AUC of 0.96. MIL is especially effective when there is a large dataset, and detailed annotations are impossible to obtain.
After the preparation of the annotated dataset, the model development process is usually composed of the following steps: preparation of the datasets for training, testing, and validation; selecting the ML framework, ML technique, and learning method. Once the learning process is completed, the output of the model is evaluated through performance metrics, and the hyperparameters are fine-tuned to improve performance. Considering the exponential increase in AI research for image analysis, this step does not seem to be a major obstacle to the implementation of AI in clinical practice.
Validation and regulation
As AI-based technologies grow increasingly, an evidence-based approach is required for their validation. Colling et al presented summarized guidance by the current in vitro device regulation and their recommendations for the main components of validation. In laboratory medicine, apart from clinical evaluation, analytical validation should be considered. The establishment of steps and criteria for the validation of new tests against existing gold standards is essential. For image analysis validation, the technique is often compared with the “ground truth” (for example, comparing an AI-technology analyzing HER2 expression within the tumor to a detailed tumor assessment performed manually). It would be appropriate to compare the digital pathology technique with the performance of human pathologists. However, considering inter- and intra-observer variability in visual assessments of human pathologists, it is difficult to identify the ground truth; thus, it involves careful designing of the study and acceptance of the limitations of the present gold standard. Currently, most AI applications seem to have difficulty in establishing absolute ground truth. Therefore, the robustness and reproducibility of AI applications should be repeatedly validated in large and variable patient cohorts.
The relative lack of a validation cohort is an urgent issue in the development of AI-based applications. Histopathological slides, with detailed clinical data linked to them, cannot be often shared widely for reasons such as privacy protection. Annotations by pathologists, which are usually considered the “ground truth”, are still controversial. Inter-observer variability and subjectivity in assessments by a pathologist indicate that a certain amount of uncertainty is inherent to ground truth. However, where the pathologist's assessment is the only available ground truth, it is important to enhance accuracy through validation as the next best measure. Efficient validation and testing require multicenter assessments involving multiple pathologists and datasets. If the AI application is intended to be used in real-life practice, it should be robust against pre-analytical variations within the target images, such as differences in staining conditions and WSI scanners, and its performance should be reproducible. With respect to this, a significant proportion of currently published AI research in GI cancers has not been externally validated.
Appropriate regulations are required for the safe and effective use of AI in pathological practice. Unlike other laboratory tests, it is difficult to understand how predictions are made in AI applications; therefore, they are often viewed as black boxes. While various visualization techniques, including gradient saliency maps and filter visualization methods, have been developed, it may not be possible for users to fully understand all the parameter changes causing erroneous performance or misprediction. Regulatory approval should be structured to minimize potential harm, define the risk-benefit balance, develop appropriate validation standards, and promote innovation.
Regulatory authorities, such as the FDA, the Centers for Medicare and Medicaid Services (CMS), and the European Union Conformité Européenne (EUCE) are not yet completely prepared for the implementation of AI applications in clinical medicine. As a result, AI-based devices are being controlled by prior and potentially obsolete guidelines for testing medical devices.
In the United States, the FDA is devising novel regulations for AI-based devices to make them safer and more effective. CMS controls laboratory testing through the Clinical Laboratory Improvement Amendments (CLIA). CLIA stipulates that appropriate validation must be performed for all laboratory tests using human tissue before clinical implementation, regardless of their FDA approval. Currently, CLIA has no specific regulations for validating AI applications. The EUCE will replace the medical device directive in May 2021, and in vitro diagnostic medical device directives will be replaced by in vitro diagnostic regulation in May 2022. Successful clinical implementation of AI-based applications will be assisted by the global market, and those clinically enforcing the applications will need to pay particular attention to the regulatory trends in their own country as well as in the US and EU. For AI applications to be approved by the FDA and EUCE, they should be established based on the updated details on FDA and EUCE regulations.
Before implementing an AI application in real-life pathology practice, several obstacles must be addressed. Established business-use cases and a guarantee from pathologists for the use of the AI system should be accounted for before investing substantial time, energy, and funds on AI applications and required IT infrastructure.
The changes required for shifting daily workflow in the pathology department, from glass slides to WSIs, must be addressed. The department would require new digital pathology-related devices, a specific data management system, data storage facilities, and additional personnel to handle these changes. Simultaneously, an institutional IT infrastructure is required to enable users to operate through both on-site and cloud-based computing systems. Therefore, in the real-world, digital pathology systems, requiring substantial investment, may hamper the implementation of these technologies. Notably, augmented microscopy, connected directly to the cloud network service, might solve the issue of whole slide scanner installation. Chen and colleagues reported the augmented reality microscope, overlaying AI-based information onto the sample-view in real-time, may enable a seamless integration of AI into the routine workflow. According to Hegde et al, the cloud-based AI application (SMILY, Similar image search for histopathology), developed by GOOGLE, irrespective of its annotation status, allows the search for morphologically similar features in a target image.
In addition, one must consider the relative inexperience of pathologists with AI-based technologies and acknowledge the range of issues the department would encounter prior to the implementation of AI. Second, a pathologist must buy-in to make significant improvements in a conventional century-old workflow. In view of the fact that progress does not happen immediately, the pathologist's management concerns should be dealt with separately from the technological hurdles. Initially, pathologists must commit to the installation of both digital pathology systems and AI applications to a pathology department. They have to understand the long-term risk-benefit balance of AI implementation. The present DL-based AI applications lack interpretability, which may contribute to patients’ and clinicians' reluctance. Developing AI solutions that can be interpreted by end-users, thereby providing them with detailed descriptions of how their predictions are made, could be useful. For lack of interpretability of DL model, various solutions, such as generating attention heat map, constructing interpretable model, creating external interpretive model, have been reported. However, this black box problem is not yet fully resolved.
On the downside, dependence on AI assistance for diagnoses can result in fewer opportunities for trainees to learn diagnostic skills. Although AI can be used as an auxiliary method to improve the quality and precision of clinical diagnoses, resident pathologists should be trained and encouraged to understand the utility, limitations, and pitfalls of AI application. As molecular pathologists have become necessary, since the advent of genomic medicine, “computational pathologists” will become necessary in the near future.
As with other clinical tests, ongoing post-marketing quality assurance is also essential for the safe and effective use of AI in clinical practice. Apart from laboratory testing processes, laboratory staff should understand the quality management system. As in conventional laboratory tests, a novel scheme of external quality assurance for AI applications in pathology should be urgently prepared for its implementation.
The use of AI applications in diagnostic practice poses complex new issues around the legal ramifications of signing a report prepared using AI by a pathologist. In order to incorporate their output into a pathological report, a pathologist should be confident in the performance of the algorithm; further, any algorithms used should be validated and regulated correctly. Although AI applications may not replace pathologists in view of this legal issue, they can be employed to support the pathologists in their clinical work. In particular, AI researchers are attempting to provide their predictions/results with confidence estimates and localize pathology-related features. This could help mitigate interpretability and confidence-building concerns.
The immense potential of AI in pathological practice can be harnessed by improving workflows, eliminating simple mistakes, increasing diagnostic reproducibility, and revealing predictions that are impossible with the use of conventional visual methods by human pathologists. The clinically implemented AI applications are expected to be user-friendly, explainable, robust, manageable, and cost-effective. Considering the current limited clinical awareness and uncertainty about how AI tools can be introduced into real-life practice, caution should be paid to their deployment. Eventually, AI applications may be implemented and used appropriately, provided they are supported by human pathologists, standardized usage recommendations, and harmonization of AI applications with present information systems.
AI can play a pivotal role in the practice of pathologists and the development of precision medicine for GI cancers. However, there are various barriers to its effective implementation. To overcome these barriers and implement AI at the practice level, it is necessary to work with a range of stakeholders, including pathologists, clinicians, developers, regulators, and device vendors, to establish a strong network to grab true needs, expand the market, and use the application safely and efficiently.
Manuscript source: Invited manuscript
Corresponding Author's Membership in Professional Societies: The Japanese Society of Pathologists; American Society of Clinical Oncology; and Japanese Association for Medical Artificial Intelligence.
PathAI Present Machine Learning Models that Predict the Homologous Recombination Deficiency Status of Breast Cancer Biopsies at the 2020 SABCS. [cited 7 January 2021]. In: PathAI [Internet]. Available from: https://www.pathai.com/news/pathai-sabcs2020.
Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, van der Laak JAWM; the CAMELYON16 Consortium; Hermsen M, Manson QF, Balkenhol M, Geessink O, Stathonikos N, van Dijk MC, Bult P, Beca F, Beck AH, Wang D, Khosla A, Gargeya R, Irshad H, Zhong A, Dou Q, Li Q, Chen H, Lin HJ, Heng PA, Haß C, Bruni E, Wong Q, Halici U, Öner MÜ, Cetin-Atalay R, Berseth M, Khvatkov V, Vylegzhanin A, Kraus O, Shaban M, Rajpoot N, Awan R, Sirinukunwattana K, Qaiser T, Tsang YW, Tellez D, Annuscheit J, Hufnagl P, Valkonen M, Kartasalo K, Latonen L, Ruusuvuori P, Liimatainen K, Albarqouni S, Mungal B, George A, Demirci S, Navab N, Watanabe S, Seno S, Takenaka Y, Matsuda H, Ahmady Phoulady H, Kovalev V, Kalinovsky A, Liauchuk V, Bueno G, Fernandez-Carrobles MM, Serrano I, Deniz O, Racoceanu D, Venâncio R. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.JAMA. 2017;318:2199-2210.
Cosatto E, Laquerre PF, Malon C, Graf HP, Saito A, Kiyuna T, Marugame A, Kamijo K.
Automated gastric cancer diagnosis on H and E-stained sections; training a classifier on a large scale with multiple instance machine learning. Proceedings of SPIE - Progress in Biomedical Optics and Imaging, MI: 2013.
U.S. Food and Drug Administration.
Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD). [cited 7 January 2021]. In: U.S. Food and Drug Administration [Internet]. Available from: https://www.fda.gov/media/122535/download.
Medical Devices – Sector. [cited 7 January 2021]. In: European Commission [Internet]. Available from: https://ec.europa.eu/growth/sectors/medical-devices_en.
Kuhn DR, Kacker RN, Lei Y, Simos DE.
Combinatorial Methods for Explainable AI. Proceedings of the 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW); 2020 Oct 24-28. IEEE, 2020: 167-170.
León F, Gélvez M, Jaimes Z, Gelvez T, Arguello H.
Supervised Classification of Histopathological Images Using Convolutional Neuronal Networks for Gastric Cancer Detection. 2019 XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA). IEEE, 2019: 1-5.
Alom M, Yakopcic C, Taha T, Asari V.
Microscopic Nuclei Classification, Segmentation and Detection with improved Deep Convolutional Neural Network (DCNN) Approaches. 2018 Preprint. Available from: arXiv:1811.03447.
Ponzio F, Macii E, Ficarra E, Di Cataldo S.
Colorectal Cancer Classification using Deep Convolutional Networks - An Experimental Study. Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2. Bioimaging, 2018: 58-66.