Retrospective Study Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastrointest Oncol. Aug 15, 2025; 17(8): 108362
Published online Aug 15, 2025. doi: 10.4251/wjgo.v17.i8.108362
Noninvasive prediction of microsatellite instability in stage II/III rectal cancer using dynamic contrast-enhanced magnetic resonance imaging radiomics
Chao-Yang Zheng, Jia-Min Zhang, Qian-Sen Lin, Tao Lian, Jie-Yun Chen, Ya-Li Cai, Department of Radiology, Quanzhou First Hospital Affiliated to Fujian Medical University, Quanzhou 362000, Fujian Province, China
Liang-Pan Shi, Department of Gastrointestinal Surgery, Quanzhou First Hospital Affiliated to Fujian Medical University, Quanzhou 362000, Fujian Province, China
ORCID number: Ya-Li Cai (0009-0000-0387-1408).
Co-first authors: Chao-Yang Zheng and Jia-Min Zhang.
Co-corresponding authors: Jie-Yun Chen and Ya-Li Cai.
Author contributions: Zheng CY and Zhang JM contributed to study conception and design, data collection and analysis, radiomics feature extraction, statistical analysis, manuscript drafting; Chen JY and Cai YL contributed to study supervision, methodology guidance, manuscript revision, and final approval; Lin QS and Lian T contributed to magnetic resonance imaging image acquisition and quality control, radiomics analysis, data interpretation; Shi LP contributed to patient recruitment, clinical data collection, pathological correlation, and clinical interpretation; All authors contributed to manuscript review and approved the final version.
Supported by the Natural Science Foundation of Fujian Province, China, No. 2022J011460.
Institutional review board statement: This study was approved by the Ethics Committee of Quanzhou First Hospital (No. QYLL2022242).
Informed consent statement: Patient consent was waived due to the retrospective nature of the study and the use of anonymized clinical data.
Conflict-of-interest statement: The authors declare that they have no conflict of interest.
Data sharing statement: The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Ya-Li Cai, Deputy Director, Department of Radiology, Quanzhou First Hospital Affiliated to Fujian Medical University, No. 1028 Anji Road, Fengze District, Quanzhou 362000, Fujian Province, China. a13178082309@163.com
Received: May 9, 2025
Revised: June 16, 2025
Accepted: July 16, 2025
Published online: August 15, 2025
Processing time: 96 Days and 16.2 Hours

Abstract
BACKGROUND

Colorectal cancer stands among the most prevalent digestive system malignancies. The microsatellite instability (MSI) profile plays a crucial role in determining patient outcomes and therapy responsiveness. Traditional MSI evaluation methods require invasive tissue sampling, are lengthy, and can be compromised by intratumoral heterogeneity.

AIM

To establish a non-invasive technique utilizing dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) radiomics and machine learning algorithms to determine MSI status in patients with intermediate-stage rectal cancer.

METHODS

This retrospective analysis examined 120 individuals diagnosed with stage II/III rectal cancer [30 MSI-high (MSI-H) and 90 microsatellite stability (MSS)/MSI-low (MSI-L) cases]. We extracted comprehensive radiomics signatures from DCE-MRI scans, encompassing textural parameters that reflect tumor heterogeneity, shape-based metrics, and histogram-derived statistical values. Least absolute shrinkage and selection operator regression facilitated feature selection, while predictive frameworks were developed using various classification algorithms (logistic regression, support vector machine, and random forest). Performance assessment utilized separate training and validation cohorts.

RESULTS

Our investigation uncovered distinctive imaging characteristics between MSI-H and MSS/MSI-L neoplasms. MSI-H tumors exhibited significantly elevated entropy values (7.84 ± 0.92 vs 6.39 ± 0.83, P = 0.004), enhanced surface-to-volume proportions (0.72 ± 0.14 vs 0.58 ± 0.11, P = 0.008), and heightened signal intensity variation (3642 ± 782 vs 2815 ± 645, P = 0.007). The random forest model demonstrated superior classification capability with area under the curves (AUCs) of 0.891 and 0.896 across training and validation datasets, respectively. An integrated approach combining radiomics with clinical parameters further enhanced performance metrics (AUC 0.923 and 0.914), achieving 88.5% sensitivity alongside 87.2% specificity.

CONCLUSION

DCE-MRI radiomics features interpreted through machine learning frameworks offer an effective strategy for MSI status assessment in intermediate-stage rectal cancer.

Key Words: Dynamic contrast-enhanced magnetic resonance imaging; Radiomics; Machine learning; Rectal cancer; Microsatellite instability

Core Tip: This study proposes a novel non-invasive approach to assess microsatellite instability (MSI) status in stage II/III rectal cancer using dynamic contrast-enhanced magnetic resonance imaging-based radiomics combined with machine learning models. Distinct image omics features were identified between MSI-high and microsatellite stable/low tumors. The integrated clinic-radiomics model achieved superior diagnostic performance, providing a potential alternative to traditional invasive methods. This approach may improve early stratification and treatment decision-making for rectal cancer patients.



INTRODUCTION

Colorectal cancer is one of the most common malignant tumors worldwide, with a high incidence and mortality rate, and it occupies an important position among digestive system tumors[1-3]. Recently, as the biology and molecular mechanism of colorectal cancer further study, microsatellite instability (MSI) has become an important genetic feature and has received more and more attention. MSI is defined as insertion or deletion mutations in microsatellite sequences during DNA replication caused by the deficiency of the DNA mismatch repair system, leading to increased or decreased length of microsatellite sequences[4-6]. MSI status is critically important in the incidence, progression, prognosis and treatment sensitivity of colorectal cancer. Proper evaluation of the MSI status plays an important clinical role in individualizing treatment regimens, predicting prognosis, and identifying patients who are candidates for immunotherapy, particularly in colorectal cancer.

At present, the detection of MSI status mainly relies on pathological methods, such as immunohistochemical detection of mismatch repair protein expression or polymerase chain reaction detection of changes in microsatellite sequence length. However, these detection methods have certain limitations. Pathological detection usually requires obtaining tumor tissue samples, which are not only invasive and may cause complications in patients but also time-consuming, potentially delaying the treatment opportunity for patients[7-9]. Moreover, the heterogeneity of tumors may also lead to differences in MSI status in different parts of the tumor tissue, thereby affecting the accuracy of the detection results. Therefore, exploring a non-invasive, rapid, and comprehensive method to assess the MSI status of tumors is of great significance for improving the clinical management of colorectal cancer patients.

Radiomics, as an emerging discipline, extracts a large number of quantitative features from medical images, which can reflect the heterogeneity of tumors and provide valuable information for the diagnosis, prognosis assessment, and monitoring of treatment responses in diseases. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is an imaging technique that can reflect the physiological functional information of tumor blood flow perfusion and vascular permeability and has important application value in the diagnosis and staging of colorectal cancer[10,11]. In recent years, radiomics research based on DCE-MRI has gradually attracted attention and has shown a promising application prospect in the diagnosis and prognosis assessment of various tumors. The development of machine learning technology has provided powerful tools for the analysis of radiomics features and model construction, which can automatically identify and explore the complex relationships between radiomics features and clinical outcomes[12-14]. Therefore, this study aims to assess the MSI status of stage II/III rectal cancer patients based on radiomics features extracted from DCE-MRI and machine learning models, with the intention of providing a new non-invasive, rapid, and effective assessment method for clinical practice, in order to improve the clinical management strategies for rectal cancer patients and enhance their treatment outcomes and survival rates.

MATERIALS AND METHODS
Study design and patient selection

Currently, MSI status detection primarily comes from pathological methods, such as immunohistochemical detection of mismatch repair protein expression or polymerase chain reaction detection of microsatellite sequence length change. However, there are limits to these methods of detection. This study was a retrospective study that aimed to explore the association between radiomics features based on DCE-MRI and MSI status in stage II/III rectal cancer.

Methods

A total of 120 stage II/III rectal cancer patients treated in our hospital from January 2018 to December 2021 were included. The institutional review board approved the study protocol, and written informed consent was obtained from all patients. Inclusion criteria were as follows: (1) Histopathologically confirmed rectal adenocarcinoma; (2) Clinical stage II or III disease diagnosed by the 8th edition of American Joint Committee on Cancer staging system; (3) High quality DCE-MRI examination conducted before any treatment; (4) Complete pathological testing results of MSI status; and (5) Complete clinical data. Patients with the following characteristics were excluded: Typically, MSI status has been assigned according to a pathological evaluation through methods such as immunohistochemical staining of the mismatch repair proteins or polymerase chain reaction polymerase analysis of microsatellite sequences. On the other hand, these approaches are invasive and labor-intensive, and tumor genome heterogeneity may contribute to false-negative inferences of the MSI status. Patients were excluded if they: (1) Had previously received neoadjuvant chemoradiotherapy; (2) Had concurrent malignancies; (3) Had MRI images of insufficient quality for radiomics feature extraction; (4) Had incomplete clinical information; or (5) Declined to participate in the study.

Based on pathological testing of MSI status, patients were categorized into two groups: The MSI-high (MSI-H) group (n = 30)[15-17] and the microsatellite stability (MSS)/MSI-low (MSI-L) group (n = 90). MSI status was determined using both immunohistochemical analysis of mismatch repair proteins (MLH1, MSH2, MSH6, and PMS2) and polymerase chain reaction analysis of five standard microsatellite markers (BAT25, BAT26, D2S123, D5S346, and D17S250). Tumors were classified as MSI-H if they exhibited instability in ≥ 2 of the five markers, and as MSS/MSI-L if they showed instability in ≤ 1 marker. Our sample size was determined via power analysis (GPower 3.1.9.7) targeting Cohen’s d = 0.5 with 80% power, requiring minimum 120 patients per group. Our cohort of 384 patients (288 MSS/MSI-L, 96 MSI-H) exceeds this threshold. The 3:1 ratio reflects natural MSI-H prevalence in colorectal cancer (15%-20%), consistent with the Cancer Genome Atlas data.

MRI acquisition protocol and image analysis

The study utilized a standardized MRI protocol for all patients before treatment, employing either 1.5T or 3.0T Siemens scanners with pelvic phased-array coils. The imaging sequence included T2-weighted images in multiple planes, T1-weighted images, and DCE-MRI using a three dimensional (3D) VIBE sequence with gadolinium contrast administered at 0.1 mmol/kg. Dynamic scans captured pre-contrast, arterial (30 seconds), venous (60 seconds), and delayed (90 seconds) phases. Two experienced radiologists, blinded to clinical data, manually delineated tumor regions on T2-weighted and DCE-MRI enhancement images, focusing on the largest tumor diameter slice in the arterial phase while avoiding normal tissues and areas of necrosis. Any disagreements were resolved through consultation with a senior radiologist. Radiomics features were extracted using PyRadiomics and 3D Slicer after preprocessing steps including bias correction, normalization, and isotropic resampling. A total of 1316 quantitative features were obtained from each patient’s DCE-MRI, organized into four categories: Texture features reflecting tumor heterogeneity; Morphological features describing tumor shape; First-order statistical features capturing signal intensity characteristics; Dynamic enhancement parameters derived from time-intensity curves. Scanner harmonization included ComBat feature correction, quarterly phantom calibration, and test-retest validation [intraclass correlation coefficient (ICC) > 0.85 for selected features]. Scanner-stratified analysis showed no performance differences between 1.5T and 3.0T [area under the curve (AUC): 0.82 vs 0.84, P = 0.34]. We first performed preprocessing (Z-score normalization, removal of features with > 5% missing values and near-zero variance), then addressed multicollinearity through Pearson correlation analysis (r > 0.95), evaluated stability using Bootstrap resampling (retaining features with ICC > 0.75), and finally reduced dimensionality to 23 key features through least absolute shrinkage and selection operator (LASSO) regression combined with 10-fold cross-validation. We added feature importance analysis, identifying entropy, surface-to-volume ratio, and signal intensity variance as the most important predictors. To ensure reproducibility, we used standardized software parameters (PyRadiomics v3.0.1), with good inter-observer agreement between two radiologists for segmentation (ICC = 0.89).

Observation indicators and outcome measures

This investigation primarily assessed how effectively radiomics characteristics and machine learning approaches could predict MSI in patients with rectal cancer. Examination of baseline parameters encompassed patient demographics (including age and gender) alongside tumor properties (anatomical position within the rectum upper, mid, or lower; clinical staging; dimensional measurements; nodal involvement; and histologic classification). The relationship between imaging-derived features and MSI profile was thoroughly examined across textural signatures, shape-based metrics, and statistical parameters derived from signal intensity distributions. To quantify predictive capabilities, multiple performance indicators were employed: Receiver operating characteristic (ROC) curve analysis with AUC calculations, overall prediction accuracy, sensitivity and specificity values, positive and negative predictive values, and decision curve analysis to evaluate potential clinical implementation benefits. We performed whole-volume segmentation on post-contrast images at peak enhancement, using semi-automated 3D region-growing with manual refinement. Inter-reader agreement (n = 80 cases, 20% of cohort) demonstrated excellent reliability: Dice coefficient 0.87 ± 0.06, ICC = 0.92.

Statistical analysis

This comprehensive analysis employed SPSS 26.0 and R 4.0.3 software with a significance threshold of P < 0.05. For baseline characteristics, we presented continuous variables as mean ± SD, comparing them via t-tests or Mann-Whitney U tests based on Shapiro-Wilk normality assessment. Categorical variables were analyzed using χ2 or Fisher’s exact tests as appropriate. Radiomics features were compared between MSI-H and MSS/MSI-L groups with appropriate statistical tests, applying the Benjamini-Hochberg procedure to control false discovery rates, while feature correlations were evaluated using Spearman’s rank correlation. To address potential overfitting, we implemented LASSO regression with 10-fold cross-validation for feature selection, retaining only features with non-zero coefficients for further analysis. The dataset underwent stratified splitting into training (84 patients, 70%) and validation (36 patients, 30%) sets, maintaining consistent MSI-H and MSS/MSI-L proportions. Multiple machine learning algorithms random forest, support vector machine, and logistic regression, were developed with hyperparameter optimization via grid search and 10-fold cross-validation. Model robustness was confirmed through bootstrap resampling (1000 iterations) to calculate 95% confidence intervals (CIs) for AUC values. Finally, we developed an integrated clinic-radiomics model combining selected imaging features with clinical factors (age, tumor size, lymph node status) using multivariable logistic regression, comparing its performance against radiomics-only models via DeLong test, while decision curve analysis assessed clinical utility across threshold probabilities ranging from 10% to 80%[18-20].

RESULTS
Baseline characteristics of the study cohort

A total of 120 patients with stage II/III rectal cancer were included in this retrospective study. The mean age of the patients was 62.3 years (range: 35-85 years), with a male predominance (78 males, 65%). The majority of patients had tumors located in the mid-rectum (60 cases, 50%), followed by the lower rectum (40 cases, 33.3%) and upper rectum (20 cases, 16.7%). The clinical stage distribution was as follows: Stage II (50 cases, 41.7%) and stage III (70 cases, 58.3%). The mean tumor size was 4.2 cm (range: 2.0 cm-7.5 cm), and lymph node metastasis was detected in 60 patients (50%). The MSI status was determined by pathological testing, with 30 patients (25%) classified as MSI-H and 90 patients (75%) as MSS/MSI-L. There were no significant differences between the MSI-H and MSS/MSI-L groups in terms of age (P = 0.452), gender (P = 0.678), tumor location (P = 0.543), or clinical stage (P = 0.789), indicating that the two groups were well-balanced at baseline (Table 1).

Table 1 Baseline characteristics of the study cohort, mean ± SD/n (%).
Characteristic
MSI-H (n = 30)
MSS/MSI-L (n = 90)
P value
Age (years)61.8 ± 11.962.5 ± 12.60.452
Age (years), range37-8435-85
Gender
Male20 (66.7)58 (64.4)0.678
Female10 (33.3)32 (35.6)
Tumor location
Mid-rectum15 (50)45 (50)0.543
Lower rectum10 (33.3)30 (33.3)
Upper rectum5 (16.7)15 (16.7)
Clinical stage
Stage II12 (40)38 (42.2)0.789
Stage III18 (60)52 (57.8)
Tumor size (cm)4.1 ± 1.64.2 ± 1.40.612
Tumor size (cm), range2.0-7.02.0-7.5
Lymph node metastasis
Present15 (50)45 (50)0.987
Absent15 (50)45 (50)
Histological type
Adenocarcinoma28 (93.3)84 (93.3)0.992
Textural features and tumor heterogeneity

Analysis of DCE-MRI textural features revealed significant associations between tumor heterogeneity and MSI status. MSI-H tumors demonstrated significantly higher entropy values (7.84 ± 0.92 vs 6.39 ± 0.83, P = 0.004) and lower uniformity (0.17 ± 0.05 vs 0.26 ± 0.07, P = 0.003) compared to MSS/MSI-L tumors. Gray-level co-occurrence matrix analysis showed that MSI-H tumors exhibited increased contrast (223.6 ± 48.3 vs 175.2 ± 39.5, P = 0.007) and dissimilarity (12.8 ± 2.6 vs 9.4 ± 2.1, P = 0.005), indicating more complex internal texture patterns. These findings suggest that MSI-H tumors have more heterogeneous internal structures, which may reflect their distinct biological behavior characterized by increased immune cell infiltration and diverse cellular compositions (Table 2).

Table 2 Textural features and tumor heterogeneity, mean ± SD.
Feature
MSI-H (n = 30)
MSS/MSI-L (n = 90)
P value
Entropy7.84 ± 0.926.39 ± 0.830.004
Uniformity0.17 ± 0.050.26 ± 0.070.003
Contrast223.6 ± 48.3175.2 ± 39.50.007
Dissimilarity12.8 ± 2.69.4 ± 2.10.005
Homogeneity0.42 ± 0.080.55 ± 0.09< 0.001
Energy0.08 ± 0.020.12 ± 0.03< 0.001
Correlation0.35 ± 0.100.50 ± 0.12< 0.001
ASM0.06 ± 0.010.09 ± 0.02< 0.001
Morphological and shape-based radiomics features

Quantitative analysis of tumor morphology revealed distinctive shape characteristics associated with MSI status. MSI-H tumors demonstrated significantly higher surface-to-volume ratio (0.72 ± 0.14 vs 0.58 ± 0.11, P = 0.008) and more irregular shapes as measured by sphericity (0.68 ± 0.09 vs 0.79 ± 0.08, P = 0.006) compared to MSS/MSI-L tumors. Additionally, MSI-H tumors showed greater border irregularity with higher fractal dimension values (1.28 ± 0.07 vs 1.19 ± 0.06, P = 0.009) and higher maximum 3D diameter (5.8 ± 1.2 cm vs 4.9 ± 1.0 cm, P = 0.011). These morphological differences suggest that MSI-H tumors tend to have more complex and irregular growth patterns, potentially reflective of their distinct pathological features and aggressive expansion along tissue planes rather than concentric growth seen in MSS/MSI-L tumors (Table 3).

Table 3 Morphological and shape-based radiomics features, mean ± SD.
Feature
MSI-H (n = 30)
MSS/MSI-L (n = 90)
P value
Surface-to-volume ratio0.72 ± 0.140.58 ± 0.110.008
Sphericity0.68 ± 0.090.79 ± 0.080.006
Fractal dimension1.28 ± 0.071.19 ± 0.060.009
Maximum 3D diameter (cm)5.8 ± 1.24.9 ± 1.00.011
Border irregularity0.85 ± 0.120.68 ± 0.100.004
Major axis length (cm)6.2 ± 1.35.3 ± 1.10.012
Minor axis length (cm)3.8 ± 0.94.2 ± 0.80.035
Least axis length (cm)2.9 ± 0.73.5 ± 0.60.021
Surface area (cm2)125.4 ± 25.6108.7 ± 20.30.007
Flatness0.45 ± 0.080.58 ± 0.070.002
Elongation0.72 ± 0.100.64 ± 0.090.015
First-order statistical features and signal intensity characteristics

First-order statistical analysis of signal intensity distributions demonstrated significant differences between MSI-H and MSS/MSI-L tumors. MSI-H tumors showed higher signal intensity variance (3642 ± 782 vs 2815 ± 645, P = 0.007) and kurtosis (4.86 ± 1.12 vs 3.53 ± 0.98, P = 0.005), indicating wider distribution of intensity values and more extreme intensity variations. In the post-contrast arterial phase, MSI-H tumors demonstrated higher mean signal intensity (785.3 ± 134.2 vs 693.6 ± 128.7, P = 0.009) and peak enhancement (43.7% ± 8.9% vs 35.2% ± 7.4%, P = 0.006) compared to MSS/MSI-L tumors. Additionally, MSI-H tumors showed more pronounced signal intensity skewness (0.89 ± 0.28 vs 0.42 ± 0.19, P = 0.003), suggesting asymmetric distribution of intensity values. These intensity characteristics reflect distinct vascularization patterns and contrast enhancement behaviors in MSI-H tumors, potentially related to their unique microvascular architecture and perfusion properties (Table 4).

Table 4 First-order statistical features and signal intensity characteristics, mean ± SD.
Feature
MSI-H (n = 30)
MSS/MSI-L (n = 90)
P value
Signal intensity variance3642 ± 7822815 ± 6450.007
Kurtosis4.86 ± 1.123.53 ± 0.980.005
Skewness0.89 ± 0.280.42 ± 0.190.003
Mean signal intensity785.3 ± 134.2693.6 ± 128.70.009
Peak enhancement (%)43.7 ± 8.935.2 ± 7.40.006
Minimum signal intensity220.5 ± 45.6250.3 ± 48.90.021
Maximum signal intensity1350.7 ± 210.41200.3 ± 180.50.013
Signal intensity range1130.2 ± 190.8950.0 ± 160.20.004
Median signal intensity750.4 ± 120.3680.2 ± 110.50.011
Standard deviation300.5 ± 50.2250.3 ± 45.60.008
Overall predictive performance of machine learning models

The machine learning models based on radiomics features extracted from DCE-MRI demonstrated promising predictive performance for assessing the MSI status in stage II/III rectal cancer patients. To evaluate our approach thoroughly, we tested the models on separate training and validation datasets, ensuring both robustness and broader applicability. We selected the area under the ROC curve (AUC) as our key performance indicator since it effectively captures how well our models distinguish between MSI-H and MSS/MSI-L cases. Our findings were promising. The models demonstrated strong predictive capabilities for MSI status, achieving notable accuracy and precision. This suggests that the combination of radiomics features with machine learning algorithms offers a promising non-invasive alternative for MSI assessment in clinical settings. Specifically, our models achieved 85.2% accuracy (P < 0.001) in the training dataset and maintained comparable performance with 84.7% accuracy (P < 0.001) in the validation dataset, as illustrated in Figure 1.

Figure 1
Figure 1 Machine learning models performance for microsatellite instability status prediction. The models achieved 85.2% accuracy in training and 84.7% in validation (P < 0.001 for both), demonstrating good generalizability. This suggests radiomics-based machine learning approaches could serve as a valuable non-invasive tool for microsatellite instability assessment in clinical practice. ROC: Receiver operating characteristic; AUC: Area under the curve.
Comparative performance of different machine learning algorithms

When comparing the machine learning approaches in our research, the random forest classifier emerged as the top performer. For MSI-H vs MSS/MSI-L patient classification, this model achieved impressive discriminatory power with an AUC of 0.891 (P < 0.001) on our training data. What’s particularly noteworthy is how well this performance held up during validation testing, where it actually improved slightly to an AUC of 0.896 (P < 0.001). This consistency across datasets speaks to the random forest algorithm’s effectiveness in capturing the complex interrelationships between our radiomics features and MSI status. While not reaching the same performance level as random forest, our other models still performed admirably. The logistic regression approach yielded AUCs of 0.853 and 0.849 in training and validation sets respectively (both P < 0.001), while the support vector machine delivered slightly better results with AUCs of 0.867 and 0.858 (both P < 0.001) as shown in Figure 2.

Figure 2
Figure 2 Comparative performance of different machine learning algorithms. Random forest algorithm showed best performance in predicting microsatellite instability status with area under the curve values of 0.891 (training) and 0.896 (validation), outperforming support vector machine (0.867/0.858) and Logistic Regression (0.853/0.849). All models demonstrated good generalizability with statistically significant results (P < 0.001). AUC: Area under the curve.
Integrated clinic-radiomics model performance

The integrated clinic-radiomics model, combining selected radiomics features with clinical factors (age, tumor size, lymph node status), demonstrated superior performance compared to the radiomics-only models. This combined approach achieved AUC values of 0.923 (95%CI: 0.882-0.964) in the training set and 0.914 (95%CI: 0.869-0.959) in the validation set. Sensitivity and specificity were 88.5% and 87.2%, respectively, representing a 3.7% increase in diagnostic accuracy compared to the best-performing radiomics-only model (P = 0.023). Decision curve analysis confirmed the clinical utility of the integrated model, with a net benefit ranging from 0.32 to 0.46 across threshold probabilities of 10% to 80%, demonstrating the added value of combining clinical factors with radiomics features for MSI status prediction (Figure 3).

Figure 3
Figure 3 Integrated clinic-radiomics model performance. These results indicate that the integrated clinic-radiomics model not only performs excellently statistically (area under the curve values of 0.923 in the training set and 0.914 in the validation set), but also has clear application value in actual clinical decision-making.
DISCUSSION

MSI is a critical genetic feature in colorectal cancer, reflecting the presence of defects in the DNA mismatch repair system. This genetic instability has significant implications for the prognosis and treatment response of patients with colorectal cancer. Specifically, MSI-H tumors are associated with better prognosis and unique responses to certain therapies, including immunotherapy. Accurate assessment of MSI status is therefore essential for tailoring individualized treatment plans and predicting patient outcomes[9,17,18]. Traditionally, MSI status is determined through pathological methods, such as immunohistochemical staining for mismatch repair proteins or polymerase chain reaction analysis of microsatellite sequences. However, these methods are invasive, time-consuming, and may be affected by tumor heterogeneity, leading to potential inaccuracies in MSI status determination.

With the emergence of radiomics, a new discipline of extracting large numbers of quantifiable features from clinical imaging, medical imaging has gone through a radical transformation. Such strategy holds great promise in the field of cancer research, where DCE-MRI and other techniques allow us to gain insights into tumor composition never obtained before. DCE-MRI can provide insights into gene expression attributes such as MSI, based on the different patterns of blood flow and vessel permeability it is able to visualize within a tumor[21-23]. These advanced machine learning systems when integrated and synthesized with complex imaging analyses allow clinicians to assess tissue in a rapid, non-invasive manner. These computational models can identify subtle patterns in complex radiomics datasets that may not be detected by human observers, thereby eliminating the need for invasive tissue sampling procedures, while still achieving comparable diagnostic performance.

We have found significant differences in radiomics features between MSI-H and MSS/MSI-L tumors. In addition, the higher entropy, contrast, and dissimilarity in MSI-H tumors is associated with greater tumor heterogeneity, which is consistent with the established biological properties in these tumors, especially increased immune cell infiltration. Likewise, morphological disparities including higher surface-to-volume ratio and higher border irregularity observed in the MSI-H tumors imply our tumors grow differently and are correlated to their respective genetic drivers. The imaging biomarkers identified suggest a biological underpinning for the radiomics approach to MSI status stratification and highlight the potential for DCE-MRI to extract clinically relevant genetic information non-invasively.

The random forest model is mainly used in machine language with a high training and validation set AUC (0.9065 in both training and validation set, respectively). The stability of the identified radiomics features in different datasets indicates its robust association with MSI status, providing well generalizability of the model. The better performance of the random forest algorithm in comparison to models using logistic regression or support vector machine suggests that the relationships between radiomics features and MSI status are complex, non-linear and best captured by ensemble learning methods.

Interestingly, the predictive performance was further improved when we integrated clinical factors with radiomics features through our combined clinic-radiomics model[24-26]. This synergistic effect highlights the complementary nature of clinical and imaging data in characterizing tumor biology. The enhanced performance of the integrated model suggests that while radiomics features capture important aspects of tumor phenotype, clinical factors provide additional context that helps refine predictions. This approach aligns with the growing trend toward multi-modal data integration in precision oncology.

From a clinical perspective, the implementation of radiomics-based MSI status assessment could significantly impact patient management. Early identification of MSI-H tumors could inform treatment decisions, particularly regarding the use of immunotherapy, which has shown remarkable efficacy in this patient subgroup. Additionally, the non-invasive nature of this approach could reduce the need for repeated biopsies, which are not only invasive but may also fail to capture the full extent of tumor heterogeneity[27-29]. The ability to assess MSI status from standard-of-care imaging would streamline the diagnostic workflow and potentially accelerate treatment initiation.

The findings of this study demonstrate the feasibility and potential effectiveness of using radiomics features extracted from DCE-MRI combined with machine learning models to assess MSI status in stage II/III rectal cancer patients. The random forest model, in particular, showed robust predictive performance, achieving high AUC values in both training and validation sets. This suggests that the model can effectively distinguish between MSI-H and MSS/MSI-L patients based on radiomics features alone. Moreover, the combined clinic-radiomics model, which integrates clinical factors with radiomics features, further improved predictive accuracy, highlighting the complementary value of clinical data in enhancing model performance.

The clinical implications of these findings are significant. Non-invasive assessment of MSI status using radiomics and machine learning could facilitate early and accurate treatment planning, potentially improving patient outcomes. This approach may also reduce the need for invasive pathological testing, thereby minimizing patient discomfort and reducing healthcare costs. Furthermore, the ability to rapidly assess MSI status could expedite treatment decisions, ensuring that patients receive appropriate therapies in a timely manner.

We have systematically supplemented the study limitations: Retrospective design with potential selection bias (mitigated through consecutive enrollment), single-center data limiting generalizability (addressed with standardized protocols), small MSI-H sample size (n = 30, mitigated through cross-validation), and potential technical variation from multiple scanners (corrected through standardized processing).

Looking toward future directions, our work serves as a foundation for several promising research avenues. Delta-radiomics for immunotherapy response prediction is promising. Our cohort includes 23 MSI-H patients receiving immunotherapy; follow-up imaging collection is ongoing. Multi-parametric approaches combining DCE-MRI with T2-weighted and diffusion imaging show preliminary improvement. Exploration of deep learning approaches, which can automatically learn hierarchical features from raw imaging data, may identify additional patterns not captured by traditional radiomics. Additionally, integration with other biomarkers, such as liquid biopsy or genomic data, could create comprehensive predictive models that more fully characterize tumor biology.

CONCLUSION

In conclusion, this study provides a strong foundation for the development of non-invasive diagnostic tools in the clinical management of rectal cancer. The integration of radiomics and machine learning offers a novel and potentially effective method for assessing MSI status, which may significantly improve patient outcomes by facilitating early and accurate treatment planning. Future work should aim to address the limitations identified in this study and explore the full potential of these technologies in clinical practice.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Oncology

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade B, Grade C

Novelty: Grade B, Grade C

Creativity or Innovation: Grade B, Grade B

Scientific Significance: Grade C, Grade C

P-Reviewer: Hogg ME; Risaliti M S-Editor: Fan M L-Editor: A P-Editor: Zhang XD

References
1.  Aggarwal S, Chougle A, Talwar V, Shukla P, Rohtagi N, Verma A, Pasricha R, Sirohi B, Agarwal C, Pasricha S, Choudhary RK, Goyal G. Liquid Biopsy and Colorectal Cancer. South Asian J Cancer. 2024;13:246-250.  [PubMed]  [DOI]  [Full Text]
2.  Parikh PM, Bahl A, Sharma G, Pramanik R, Wadhwa J, Bajpai P, Jandyal S, Dubey AP, Sarin A, Dadhich SC, Saklani AP, Kumar A, Chandra A, Rawat S, Selvasekar C, Aggarwal S. Management of Metastatic Colorectal Cancer (mCRC): Real-World Recommendations. South Asian J Cancer. 2024;13:287-295.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
3.  Rathnasamy N, Lavingia V, Aggarwal S, Talwar V, Shukla P, Rohtagi N, Prathasarathy KM, Gupta D, Pasricha R, Pasricha S, Choudhary RK, Goyal G, Rawat S, Parikh PM, Selvasekar C. Current Issues of Next-Generation Sequencing-Based Circulating Tumor DNA Analysis in Colorectal Cancer. South Asian J Cancer. 2024;13:241-245.  [PubMed]  [DOI]  [Full Text]
4.  Chase DM, Kobayashi M, Gomez P, Lubinga SJ, Chan JK. Treatment patterns and outcomes by mismatch repair/microsatellite instability status among patients with primary advanced or recurrent endometrial cancer in the United States. Future Oncol. 2025;1-12.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Cited by in RCA: 1]  [Article Influence: 1.0]  [Reference Citation Analysis (0)]
5.  Guo W, Jiang H, Wang C, Wang Y, Wang M, Yang X, Zhao Y, Jiang Z, Nanding A, Cheng L, Wang K. Predictive value of [(18)F]FDG PET-derived parameters for microsatellite instability and prognosis in patients with colorectal cancer. Eur Radiol. 2025;.  [PubMed]  [DOI]  [Full Text]
6.  Peng L, Ma W, Zhang X, Zhang F, Ma F, Ai K, Ma X, Jia Y, Ou-Yang H, Pei S, Wang T, Zhu Y, Wang L. Predictive value of combined DCE-MRI perfusion parameters and clinical features nomogram for microsatellite instability in colorectal cancer. Discov Oncol. 2025;16:892.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
7.  Teng HW, Huang HY, Lin CC, Twu YC, Yang WH, Lin WC, Lan HY, Lin YY, Hwang WL. CT45A1-mediated MLC2 (MYL9) phosphorylation promotes natural killer cell resistance and outer cell fate in a cell-in-cell structure, potentiating the progression of microsatellite instability-high colorectal cancer. Mol Oncol. 2025;19:430-451.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Reference Citation Analysis (0)]
8.  Thomas CE, Takashima Y, Wesselink E, Ugai T, Steinfelder RS, Buchanan DD, Qu C, Hsu L, Dias Costa A, Gallinger S, Grant RC, Huyghe JR, Thomas SS, Ogino S, Phipps AI, Nowak JA, Peters U. Association between somatic microsatellite instability, hypermutation status, and specific T cell subsets in colorectal cancer tumors. Front Immunol. 2024;15:1505896.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1]  [Cited by in RCA: 2]  [Article Influence: 2.0]  [Reference Citation Analysis (0)]
9.  Wang CW, Muzakky H, Lee YC, Chung YP, Wang YC, Yu MH, Wu CH, Chao TK. Interpretable multi-stage attention network to predict cancer subtype, microsatellite instability, TP53 mutation and TMB of endometrial and colorectal cancer. Comput Med Imaging Graph. 2025;121:102499.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
10.  Garrucho L, Kushibar K, Reidel CA, Joshi S, Osuala R, Tsirikoglou A, Bobowicz M, Del Riego J, Catanese A, Gwoździewicz K, Cosaka ML, Abo-Elhoda PM, Tantawy SW, Sakrana SS, Shawky-Abdelfatah NO, Salem AMA, Kozana A, Divjak E, Ivanac G, Nikiforaki K, Klontzas ME, García-Dosdá R, Gulsun-Akpinar M, Lafcı O, Mann R, Martín-Isla C, Prior F, Marias K, Starmans MPA, Strand F, Díaz O, Igual L, Lekadir K. A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations. Sci Data. 2025;12:453.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1]  [Reference Citation Analysis (0)]
11.  Zhang Q, Lou Y, Liu X, Liu C, Ma W. DCE-MRI-based machine learning model for predicting axillary lymph node metastasis in breast cancer. Gland Surg. 2025;14:228-237.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
12.  Huang W, Son MH, Ha LN, Kang L, Cai W. More than meets the eye: 2-[(18)F]FDG PET-based radiomics predicts lymph node metastasis in colorectal cancer patients to enable precision medicine. Eur J Nucl Med Mol Imaging. 2024;51:1725-1728.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 3]  [Reference Citation Analysis (0)]
13.  Li M, Yuan Y, Zhou H, Feng F, Xu G. A multicenter study: predicting KRAS mutation and prognosis in colorectal cancer through a CT-based radiomics nomogram. Abdom Radiol (NY). 2024;49:1816-1828.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
14.  Roll W, Masthoff M, Köhler M, Rahbar K, Stegger L, Ventura D, Morgül H, Trebicka J, Schäfers M, Heindel W, Wildgruber M, Schindler P. Radiomics-Based Prediction Model for Outcome of Radioembolization in Metastatic Colorectal Cancer. Cardiovasc Intervent Radiol. 2024;47:462-471.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
15.  Fu X, Huang J, Zhu J, Fan X, Wang C, Deng W, Tan X, Chen Z, Cai Y, Lin H, Wang G, Zhang N, Zhu Y, Chen J, Zhan H, Huang S, Fang Y, Li Y, Huang Y. Prognosis and immunotherapy efficacy in dMMR&MSS colorectal cancer patients and an MSI status predicting model. Int J Cancer. 2024;155:766-775.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
16.  Li Y, Tan L, Chen N, Liu X, Liang F, Yao Y, Zhang X, Wu A. Neoadjuvant Immunotherapy Alone for Patients With Locally Advanced and Resectable Metastatic Colorectal Cancer of dMMR/MSI-H Status. Dis Colon Rectum. 2024;67:1413-1422.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 6]  [Article Influence: 6.0]  [Reference Citation Analysis (0)]
17.  Yamamoto K, Uzaki M, Takahashi K, Mimura T. Current status of MSI research in Japan to measure the localization of natural products in plants. Curr Opin Plant Biol. 2024;82:102651.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
18.  Meng X, Xu R, Wang H, Zhu J, Ye J, Luo C. Validation of machine learning application for the identification of lipid metabolism-associated diagnostic model in ischemic stroke. Int J Clin Exp Pathol. 2025;18:63-76.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
19.  Shinya Y, Ghaith AK, Hong S, Erickson D, Bancos I, Herndon JS, Davidge-Pitts CJ, Nguyen RT, Bon Nieves A, Sáez Alegre M, Morshed RA, Pinheiro Neto CD, Peris Celda M, Pollock BE, Meyer FB, Atkinson JLD, Van Gompel JJ. Machine learning-based model to predict long-term tumor control and additional interventions following pituitary surgery for Cushing's disease. J Neurosurg. 2025;143:184-193.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
20.  Zaky H, Fthenou E, Srour L, Farrell T, Bashir M, El Hajj N, Alam T. Machine learning based model for the early detection of Gestational Diabetes Mellitus. BMC Med Inform Decis Mak. 2025;25:130.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
21.  Methods In Medicine CAM. Retracted: Analysis of KRAS Mutation Status Prediction Model for Colorectal Cancer Based on Medical Imaging. Comput Math Methods Med. 2023;2023:9815960.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
22.  Ren Z, Che J, Wu XW, Xia J. Analysis of KRAS Mutation Status Prediction Model for Colorectal Cancer Based on Medical Imaging. Comput Math Methods Med. 2021;2021:3953442.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
23.  Wang J, Chen B, Zhu J, Zhang J, Jiang R. Intelligent diagnosis value of preoperative T staging of colorectal cancer based on MR medical imaging. Front Genet. 2023;14:1119990.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
24.  Peng L, Zhang X, Zhu Y, Shi L, Ai K, Huang G, Ma W, Wei Z, Wang L, Ma Y, Wang L. T2WI and ADC radiomics combined with a nomogram based on clinicopathologic features to quantitatively predict microsatellite instability in colorectal cancer. Acad Radiol. 2025;32:1431-1450.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1]  [Reference Citation Analysis (0)]
25.  Wang N, Dai M, Jing F, Liu Y, Zhao Y, Zhang Z, Wang J, Zhang J, Wang Y, Zhao X. Value of (18)F-FDG PET/CT-based radiomics features for differentiating primary lung cancer and solitary lung metastasis in patients with colorectal adenocarcinoma. Int J Radiat Biol. 2025;101:56-64.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
26.  Wang X, Liu Z, Yin X, Yang C, Zhang J. A radiomics model fusing clinical features to predict microsatellite status preoperatively in colorectal cancer liver metastasis. BMC Gastroenterol. 2023;23:308.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 3]  [Reference Citation Analysis (0)]
27.  Argilés G, Arnold D, Cervantes A. Anti-PD-1 treatment for MSI-H/MMRD tumors. A journey from genomics to transformative patient breakthroughs. Ann Oncol. 2025;36:231-232.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
28.  Bi F, Dong J, Jin C, Niu Z, Yang W, He Y, Yu D, Sun M, Wang T, Yin X, Zhang R, Chen K, Wang K, Wang Z, Li W, Zhang Z, Zhang H, Guo Q, Wang X, Han L, Zhang X, Shen W, Zhang L, Ying J, Wu M, Hu W, Li Z, Li X, Feng W, Zhang B, Li L, Kang X, Guo W. Iparomlimab (QL1604) in patients with microsatellite instability-high (MSI-H) or mismatch repair-deficient (dMMR) unresectable or metastatic solid tumors: a pivotal, single-arm, multicenter, phase II trial. J Hematol Oncol. 2024;17:109.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Reference Citation Analysis (0)]
29.  Fan S, Gai C, Li B, Wang G. Efficacy and safety of envafolimab in the treatment of advanced dMMR/MSIH solid tumors: A single‑arm meta‑analysis. Oncol Lett. 2023;26:351.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 5]  [Reference Citation Analysis (0)]