Retrospective Cohort Study
Copyright ©The Author(s) 2022. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Sep 28, 2022; 28(36): 5338-5350
Published online Sep 28, 2022. doi: 10.3748/wjg.v28.i36.5338
Machine learning-based gray-level co-occurrence matrix signature for predicting lymph node metastasis in undifferentiated-type early gastric cancer
Xin Wei, Xue-Jiao Yan, Yu-Yan Guo, Jie Zhang, Guo-Rong Wang, Arsalan Fayyaz, Jiao Yu
Xin Wei, Department of Oncology, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Xue-Jiao Yan, Department of Magnetic Resonance, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Yu-Yan Guo, Department of Radiotherapy, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710004, Shaanxi Province, China
Jie Zhang, Department of Gastrointestinal Surgery, Shaanxi Provincial Tumour Hospital, Xi’an 710068, Shaanxi Province, China
Guo-Rong Wang, Department of General Surgery, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Arsalan Fayyaz, School of Management, Northwestern Polytechnical University, Xi’an 710072, Shaanxi Province, China
Jiao Yu, Department of Radiotherapy, Shaanxi Provincial People’s Hospital, Xi’an 710068, Shaanxi Province, China
Author contributions: Yu J and Wei X conceived and designed the study and wrote the manuscript; Yan XJ, Guo YY, Zhang J, Wang GR, and Arsalan F collected the data, performed the data analysis, and interpreted the outcomes; and all authors critically reviewed the content of the manuscript and helped with the drafts.
Supported by the General Project-Social Development Field of Shaanxi Province Science and Technology Department, No. 2021SF-313; and Innovation Capability Support Plan of Shaanxi Science and Technology Department - Science and Technology Innovation Team, No. 2020TD-048.
Institutional review board statement: This study was approved by the Institutional Review Committee of Shaanxi Provincial People’s Hospital (2021-Y024).
Informed consent statement: Written informed consent was not required given the retrospective nature of the study from chart review.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
STROBE statement: The authors have read the STROBE Statement-a checklist of items is provided. The manuscript was prepared and revised according to the STROBE Statement-a checklist of items is provided.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Jiao Yu, MD, Radiologist, Department of Radiotherapy, Shaanxi Provincial People’s Hospital, No. 256 Youyi West Road, Beilin District, Xi’an 710068, Shaanxi Province, China. shawn170215@163.com
Received: July 20, 2022
Peer-review started: July 20, 2022
First decision: August 6, 2022
Revised: August 14, 2022
Accepted: September 6, 2022
Article in press: September 6, 2022
Published online: September 28, 2022
Abstract
BACKGROUND

The most important consideration in determining treatment strategies for undifferentiated early gastric cancer (UEGC) is the risk of lymph node metastasis (LNM). Therefore, identifying a potential biomarker that predicts LNM is quite useful in determining treatment.

AIM

To develop a machine learning (ML)-based integral procedure to construct the LNM gray-level co-occurrence matrix (GLCM) prediction model.

METHODS

We retrospectively selected 526 cases of UEGC confirmed through pathological examination after radical gastrectomy without endoscopic treatment in four tertiary hospitals between January 2015 to December 2021. We extracted GLCM-based features from grayscale images and applied ML to the classification of candidate predictive variables. The robustness and clinical utility of each model were evaluated based on the following factors: Receiver operating characteristic curve (ROC), decision curve analysis, and clinical impact curve.

RESULTS

GLCM-based feature extraction significantly correlated with LNM. The top 7 GLCM-based factors included inertia value 0° (IV_0), inertia value 45° (IV_45), inverse gap 0° (IG_0), inverse gap 45° (IG_45), inverse gap full angle (IG_all), Haralick 30° (Haralick_30), Haralick full angle (Haralick_all), and Entropy. The areas under the ROC curve (AUCs) of the random forest classifier (RFC) model, support vector machine, eXtreme gradient boosting, artificial neural network, and decision tree ranged from 0.805 [95% confidence interval (CI): 0.258-1.352] to 0.925 (95%CI: 0.378-1.472) in the training set and from 0.794 (95%CI: 0.237-1.351) to 0.912 (95%CI: 0.355-1.469) in the testing set, respectively. The RFC (training set: AUC: 0.925, 95%CI: 0.378-1.472; testing set: AUC: 0.912, 95%CI: 0.355-1.469) model that incorporates Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, and IV_45 had the highest predictive accuracy.

CONCLUSION

The evaluation results indicate that the method of selecting radiological and textural features becomes more effective in the LNM discrimination against UEGC patients. Additionally, the ML-based prediction model developed using the RFC can be used to derive treatment options and identify LNM, which can hence improve clinical outcomes.

Keywords: Undifferentiated early gastric cancer, Machine learning, Lymph node metastasis, Gray-level co-occurrence matrix, Feature selection, Prediction

Core Tip: Gray-level co-occurrence matrix-based feature extraction can be a robust and promising tool to improve the efficiency in predicting lymph node metastasis of individual undifferentiated early gastric cancer patients. Additionally, machine learning adopts more optimized algorithms and more clear feature extraction. Models developed using random forest classifier have the highest predictive accuracy in terms of Entropy, Haralick full angle, Haralick 30°, inverse gap full angle, inverse gap 45°, inverse gap 0°, and inertia value 45°. Further research is required to develop these models for clinical practice.