Copyright
©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
Prognostic role of Ki-67 in colorectal carcinoma: Development and evaluation of machine learning prediction models
Da-Tong Zeng, Ming-Jie Li, Rui Lin, Wei-Jian Huang, Shi-De Li, Wan-Ying Huang, Bin Li, Qi Li, Gang Chen, Jia-Shu Jiang
Da-Tong Zeng, Ming-Jie Li, Rui Lin, Shi-De Li, Wan-Ying Huang, Bin Li, Qi Li, Gang Chen, Department of Pathology, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi Zhuang Autonomous Region, China
Da-Tong Zeng, Wei-Jian Huang, Department of Pathology, Red Cross Hospital of Yulin, Yulin 537000, Guangxi Zhuang Autonomous Region, China
Ming-Jie Li, Department of Forensic Medicine, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi Zhuang Autonomous Region, China
Shi-De Li, Department of Information Management and Information System, School of Information and Management, Guangxi Medical University, Nanning 530021, Guangxi Zhuang Autonomous Region, China
Jia-Shu Jiang, Department of International Cooperation and External Exchange, The First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi Zhuang Autonomous Region, China
Co-first authors: Da-Tong Zeng and Ming-Jie Li.
Co-corresponding authors: Gang Chen and Jia-Shu Jiang.
Author contributions: Zeng DT, Li MJ, Chen G, and Jiang JS conceived and designed the manuscript; Zeng DT and Li MJ contributed equally to this article, they are the co-first authors of this manuscript; Zeng DT, Lin R, Huang WJ, Huang WY, Li B, and Li Q performed sample collection, digital pathology scanning and Ki-67 immunohistochemical staining; Zeng DT, Li MJ, Huang WJ, Huang WY, Li B, Li Q, and Chen G estimated the Ki-67 index; Lin R, Li SD, and Chen G performed the design and statistical analysis of the machine learning algorithm; Zeng DT, Li MJ, Lin R, and Li SD prepared the first draft of the manuscript; Huang WY, Chen G, and Jiang JS corrected the paper; Chen G and Jiang JS contributed equally to this article, they are the co-corresponding authors of this manuscript; and all authors have read and approved the final manuscript.
Supported by the Guangxi Zhuang Autonomous Region Health Commission Scientific Research Project, No. Z20210442.
Institutional review board statement: This study was approved by the Medical Ethics Committee of the First Affiliated Hospital of Guangxi Medical University, approval No. 2025-E0288.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:
https://creativecommons.org/Licenses/by-nc/4.0/ Corresponding author: Jia-Shu Jiang, Associate Chief Physician, Associate Professor, Re
searcher, Department of International Cooperation and External Exchange, The First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning 530021, Guangxi Zhuang Autonomous Region, China.
jiangjiashu115@163.com
Received: March 21, 2025
Revised: May 11, 2025
Accepted: July 3, 2025
Published online: August 24, 2025
Processing time: 153 Days and 3.3 Hours
BACKGROUND
Ki-67 is a routine test item in clinical pathology departments. However, its prognostic value requires further investigation, especially in the context of research using machine learning (ML), which remains relatively underdeveloped.
AIM
To investigate the prognostic value of Ki-67 in cases of colorectal carcinoma (CRC) and explore the potential application of ML algorithms to predict the Ki-67 index.
METHODS
Case data and pathological sections from two centers were systematically collected. To analyze the prognostic value of the Ki-67 index in CRC, multiple cutoff values were established. Meanwhile, by virtue of the histological features presented in the hematoxylin and eosin-stained CRC images, three mainstream ML algorithms, support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost) were employed to construct prediction models. Subsequently, the potential of these algorithms to classify and predict the Ki-67 index was explored.
RESULTS
Non-parametric tests revealed that Ki-67 ≥ 40% correlated with a high histological grade (P = 0.017), deficient mismatch repair protein status associated with ≥ 50%-90% cutoffs (all P ≤ 0.028), and ≥ 80% linked to lymph node metastasis (P = 0.006). Kaplan-Meier analysis showed that Ki-67 ≥ 50% predicted higher survival (log-rank P = 0.0299, hazard ratio = 2.142), with no differences for other cutoffs. COX regression identified the Ki-67 positive rate as a significant predictor (P = 0.027, hazard ratio = 2.583), while other variables had no association. In algorithmic model predictions, the SVM, RF, and XGBoost models achieved training area under the curve (AUC) values of 0.851, 0.948, and 0.872, respectively, with corresponding test set AUC values of 0.795, 0.755, and 0.750, respectively. During external validation, their AUC values for predicting Ki-67 status reached 0.757, 0.749, and 0.783, respectively.
CONCLUSION
In algorithmic model predictions, the SVM, RF, and XGBoost models achieved training AUC values of 0.851, 0.948, and 0.872, respectively, with corresponding test set AUC values of 0.795, 0.755, and 0.750, respectively. During external validation, their AUC values for predicting Ki-67 status reached 0.757, 0.749, and 0.783, respectively.
Core Tip: This study pioneers the application of machine learning to predict Ki-67 status in colorectal carcinoma directly from hematoxylin and eosin-stained images. By analyzing data, 50% was identified as the optimal Ki-67 cutoff, with high-expression being linked to improved survival rates and low-expression being associated with advanced tumor stage and lymph node metastasis. Predictive models were developed using the support vector machine, random forest, and eXtreme gradient boosting algorithms, achieving area under the curve values (0.851-0.948 in training and 0.750-0.795 in the external validation group). This innovative approach highlights the potential of machine learning to enhance prognostic accuracy.