Retrospective Study Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastrointest Oncol. May 15, 2025; 17(5): 102459
Published online May 15, 2025. doi: 10.4251/wjgo.v17.i5.102459
Development and validation of machine learning nomograms for predicting survival in stage IV pancreatic cancer: A retrospective study
Kun Huang, Yun-Shen He, Department of General Surgery, Mianyang Hospital of Traditional Chinese Medicine, Mianyang 621000, Sichuan Province, China
Zhu Chen, Xiang Lan, Chen-You Du, Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400000, China
Xin-Zhu Yuan, Department of Nephrology, The Second Clinical Medical Institution of North Sichuan Medical College (Nanchong Central Hospital) and Nanchong Key Laboratory of Basic Science & Clinical Research on Chronic Kidney Disease, Nanchong 637000, Sichuan Province, China
ORCID number: Xiang Lan (0009-0004-5016-1301).
Co-first authors: Kun Huang and Zhu Chen.
Co-corresponding authors: Xiang Lan and Chen-You Du.
Author contributions: Huang K and Chen Z contributed equally to this work and share first authorship. Huang K and Chen Z designed the study, collected the data, and performed the primary data analysis. Yuan XZ and He YS assisted in data interpretation, statistical analysis, and manuscript drafting. All authors reviewed and approved the final manuscript and agreed to be accountable for all aspects of the research. Lan X ensured the accuracy and integrity of the work and also provided substantial input in refining the methodology and results. Du CY supervised the study and played a pivotal role in the conceptualization of the research. Du CY provided significant contributions to manuscript revisions and ensured that all research standards were adhered to during the study. Additionally, Du CY coordinated the overall project, reviewed critical data, and was responsible for the submission of the final version of the manuscript. Huang K and Chen Z made critical and indispensable contributions to the conception, design, and execution of the research, sharing the primary responsibility for data collection and analysis, which justifies their designation as co-first authors. Lan X and Du CY, as co-corresponding authors, both played an essential role in the conceptualization, data supervision, and final revisions of the manuscript, with Du CY specifically overseeing the study's overall progression and ensuring that the manuscript adhered to all academic and ethical standards.
Supported by Mianyang Health and Health Committee 2023 Scientific Research Project, No. 202309; Chengdu University of Traditional Chinese Medicine University-Hospital Joint Innovation Fund, No. LH202402010; and Mianyang Chinese Medicine Association 2024 Traditional Chinese Medicine Inheritance and Innovation Science and Technology Project, No. MYSZYYXH-202426.
Institutional review board statement: This study received ethical exemption from the Ethics Committee of Mianyang Hospital of Traditional Chinese Medicine, as it utilizes publicly available, de-identified patient data from the SEER database. The SEER database ensures patient anonymity and data protection, and therefore, informed consent was not required for this study. All analyses were conducted in strict accordance with SEER guidelines for ethical research use.
Informed consent statement: This study utilized data from the Surveillance, Epidemiology, and End Results (SEER) database, a publicly available resource that provides deidentified patient data. As all personal identifiers have been removed in the SEER database, there is no direct involvement with individual patients, and informed consent is not required for the use of this data
Conflict-of-interest statement: The authors declare that they have no conflicts of interest to disclose.
Data sharing statement: The data supporting the findings of this study are publicly available in the SEER database, maintained by the United States National Cancer Institute. Access to the SEER data is granted upon request to researchers who meet the criteria for access to confidential data, and the data can be obtained through SEER*Stat software version 8.3.9. Details on SEER data access are available at https://seer.cancer.gov/.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Xiang Lan, MD, Professor, Department of Hepatobiliary Surgery, The First Affiliated Hospital of Chongqing Medical University, No. 1 Youyi Road, Yuanjiagang, Yuzhong District, Chongqing 400000, China. lanxiangkeyan@163.com
Received: October 21, 2024
Revised: February 17, 2025
Accepted: March 10, 2025
Published online: May 15, 2025
Processing time: 207 Days and 16.3 Hours

Abstract
BACKGROUND

Stage IV pancreatic cancer (PC) has a poor prognosis and lacks individualized prognostic tools. Current survival prediction models are limited, and there is a need for more accurate, personalized methods. The Surveillance, Epidemiology, and End Results (SEER) database offers a valuable resource for studying large patient cohorts, yet machine learning-based nomograms for stage IV PC prognosis remain underexplored. This study hypothesizes that a machine learning-based nomogram can predict cancer-specific survival (CSS) and overall survival (OS) with high accuracy in stage IV PC patients.

AIM

To construct and validate a machine learning-based nomogram for predicting survival in stage IV PC patients using real-world data.

METHODS

Clinical data from stage IV PC patients diagnosed via pathology from 2000 to 2019 were extracted from the SEER database. Patients were randomly divided into a training set and a validation set in a 7:3 ratio. Multivariate Cox proportional hazards, Least Absolute Shrinkage and Selection Operator regression, and Random Survival Forest models were used to identify prognostic variables. A nomogram was constructed to predict CSS and OS at 6, 12, and 18 months. The C-index, receiver operating characteristic curves, and calibration curves were used to evaluate the model’s predictive performance.

RESULTS

A total of 1662 patients were included (1163 in the training set, 499 in the validation set). The median follow-up times were 4 months [interquartile range (IQR): 1-10 months] for the training set and 4 months (IQR: 1-11 months) for the validation set. Key independent prognostic factors identified included age, race, marital status, tumor location, N stage, grade, surgery, chemotherapy, and liver metastasis. The nomogram accurately predicted OS and CSS at 6, 12, and 18 months, with a C-index of 0.727 (OS) and 0.727 (CSS) in the training set, and 0.719 (OS) and 0.716 (CSS) in the validation set. Calibration curves demonstrated excellent model accuracy.

CONCLUSION

The nomogram developed using age, grade, chemotherapy, surgery, and liver metastasis as predictors can reliably estimate survival outcomes for stage IV PC patients and offers a potential tool for individualized clinical decision-making.

Key Words: Stage IV pancreatic ductal adenocarcinoma; Prognosis; Surveillance Epidemiology, and End Results Program; Machine learning; Cancer survival; Prognostic model

Core Tip: This study develops and validates a machine learning-based nomogram to predict survival in patients with stage IV pancreatic ductal adenocarcinoma. Using data from the Surveillance, Epidemiology, and End Results program, the model integrates prognostic factors and demonstrates high accuracy in predicting cancer-specific survival and overall survival at various time points. This nomogram offers a personalized approach to survival prediction, potentially guiding clinical decision-making and treatment strategies for advanced pancreatic cancer patients.



INTRODUCTION

Pancreatic cancer (PC) is a significant human health issue and, by 2025, is projected to surpass breast cancer as the third leading cause of cancer-related deaths[1]. In the United States, an estimated 66440 new cases and 51750 deaths due to PC were reported in 2024. PC is often asymptomatic in its early stages, with more than half of patients presenting with distant organ metastasis at the time of initial diagnosis[2]. Consequently, the prognosis is very poor, with a 5-year relative survival rate of only 12.8%[2] In clinical practice, considerable heterogeneity in survival outcomes has been observed among patients with stage IV PC, highlighting the need for an individualized survival prediction tool for this population.

Nomograms, which are visual tools incorporating multiple prognostic factors to predict patient survival, aid in personalized treatment planning and clinical decision-making and are widely used in cancer prognosis evaluation[3-6].

Machine learning, a core technique within artificial intelligence, employs algorithms to analyze data, learn from patterns, and predict real-world events with high accuracy, and is increasingly applied in health assessment, medical decision-making, prognosis, and personalized treatment[7-9].

This study leverages the large sample size and comprehensive clinical data from the United State Surveillance, Epidemiology, and End Results (SEER) database to develop a prognostic nomogram for stage IV PC patients using machine learning, with the aim of providing individualized prognostic assessments to improve clinical decision-making.

MATERIALS AND METHODS
Database and patient selection

Utilizing SEER*Stat v8.3.9 software, clinical data of patients with stage IV PC diagnosed pathologically between 2000 and 2019 were extracted from the SEER database.

Inclusion and exclusion criteria

Patients were included in the study if they met all of the following criteria: (1) Had a diagnosis of primary PC at stage IV according to the tumor-node-metastasis (TNM) classification; (2) Were diagnosed pathologically; (3) Were diagnosed between 2000 and 2019; and (4) Had International Classification of Diseases-O-3 codes of 8576/3, 8560/3, 8500/3, 8490/3, 8480/3, 8035/3, or 8020/3. Exclusion criteria were as follows: (1) Patients with multifocal tumors; (2) Cases diagnosed posthumously via autopsy or death certificate; (3) Patients with inaccessible study metrics; and (4) Individuals with incomplete clinical or follow-up information.

Data extraction

Data extracted included patients' age at diagnosis, race, gender, primary tumor location, treatment information, survival time, and survival outcomes.

Statistical analysis

The primary endpoints of this study were cancer-specific survival (CSS) and overall survival (OS). CSS was defined as the time (in months) from the start of follow-up to death due to PC, with deaths from other causes, censoring at the conclusion of follow-up, or loss to follow-up considered censored events. OS was defined as the time (in months) from the start of follow-up to death from any cause, with similar criteria for censored events applied[3]. Statistical analyses were conducted using Stata/MP 16.0 and R (version 4.2.3). Normally distributed data are expressed as mean ± SD, while non-normal data are presented as median [interquartile range (IQR)]. Categorical variables are presented as rates (percentages) and compared using the chi-square test. The dataset was randomly divided into a training set (70%) for nomogram development and internal validation, and a validation set (30%) for external validation[10,11]. Model accuracy was validated by constructing calibration curves through 1000 bootstrap resamples. The model's discriminative ability was assessed by calculating the C-index, and survival rates were estimated using the Kaplan-Meier method, with survival rate comparisons made using the log-rank test. The variable selection process for model construction involved an initial univariate Cox proportional hazards analysis, followed by the integration of significant factors into a multivariate Cox proportional hazards model based on the results of the univariate analysis and their clinical significance[10,11] This process ultimately identified the independent prognostic factors. The multivariate Cox proportional hazards model was used to calculate hazard ratios (HR) and their corresponding 95%CI[11]. In this analysis, multicollinearity was evaluated using variance inflation factors (VIFs) and pairwise Pearson correlation coefficients[12,13]. A VIF value exceeding 5 to 10 indicated significant multicollinearity, necessitating exclusion of the associated variables prior to further analysis[14]. Pairwise Pearson correlation coefficients were computed to assess collinearity among variables, with a threshold of < 0.7 considered as no significant collinearity[15,16]. To refine variable selection and prevent overfitting, Least Absolute Shrinkage and Selection Operator (LASSO) regression was applied, with a 10-fold cross-validation approach used to determine the optimal λ parameter. Variables with nonzero coefficients were retained to ensure that only the most relevant prognostic factors were included in the final model.

In parallel, a random survival forest model was employed for independent prognostic analysis. The random survival forest, a machine learning algorithm known for its robustness, operates independently of the proportional hazards assumption or log-linearity and utilizes two-stage random sampling to mitigate overfitting. Variable importance scores were computed to further assess the contribution of each predictor to OS and CSS[10]. Model evaluation included receiver operating characteristic (ROC) curve analysis, where area under the curve (AUC) values at 6, 12, and 18 months were calculated to assess the model’s discriminative ability.

The optimal cutoff values for risk stratification were determined using X-tile software (version 3.6.1), and patients were categorized into low-risk and high-risk groups accordingly. Kaplan-Meier curves were generated to estimate survival probabilities, and the log-rank test was used to compare survival differences between risk groups. To evaluate the potential clinical benefit of the nomogram in guiding individualized treatment decisions, decision curve analysis (DCA) was performed, comparing its net benefit to that of the traditional TNM staging system. All statistical tests were two-tailed, with P < 0.05 considered statistically significant.

RESULTS
Baseline demographics and clinical characteristics of included patients

A total of 1662 patients were enrolled in this study, with 1163 in the training set and 499 in the validation set. The clinicopathological characteristics of the two groups are presented in Table 1. As shown in Table 1, baseline characteristics were well-balanced between the two groups (all P > 0.05).

Table 1 Baseline demographics and clinical characteristics of patients in the training and validation cohorts, n (%).
Variables
Total cohort (n = 1662)
Training cohort (n = 1163)
Validation cohort (n = 499)
P value
Age (yeas)0.281
    < 60445 (26.77)313 (26.91)132 (26.45)
    60-75838 (50.42)597 (51.33)241 (48.30)
    > 75379 (22.80)253 (21.75)126 (25.25)
Sex0.062
    Female803 (48.32)544 (46.78)259 (51.90)
    Male859 (51.68)619 (53.22)240 (48.10)
Race0.191
    White1326 (79.78)937 (80.57)389 (77.96)
    Black172 (10.35)110 (9.46)62 (12.42)
    Other164 (9.87)116 (9.97)48 (9.62)
Marital status0.935
    Married895 (53.85)630 (54.17)265 (53.11)
    Unmarried273 (16.43)189 (16.25)84 (16.83)
    Divorced429 (25.81)297 (25.54)132 (26.45)
    Unknown65 (3.91)47 (4.04)18 (3.61)
Tumor location0.806
    Head621 (37.36)426 (36.63)195 (39.08)
    Body258 (15.52)182 (15.65)76 (15.23)
    Tail334 (20.10)235 (20.21)99 (19.84)
    Other449 (27.02)320 (27.52)129 (25.85)
T stage0.174
    T1/T2411 (24.73)303 (26.05)108 (21.64)
    T3575 (34.60)386 (33.19)189 (37.88)
    T4337 (20.28)236 (20.29)101 (20.24)
    TX339 (20.40)238 (20.46)101 (20.24)
N stage0.833
    N0781 (46.99)547 (47.03)234 (46.89)
    N1599 (36.04)415 (35.68)184 (36.87)
    NX282 (16.97)201 (17.28)81 (16.23)
Grade0.701
    Grade I/II296 (17.81)204 (17.54)92 (18.44)
    Grade III/IV292 (17.57)200 (17.20)92 (18.44)
    Unknown1074 (64.62)759 (65.26)315 (63.13)
Surgery0.672
    No1482 (89.17)1040 (89.42)442 (88.58)
    Yes180 (10.83)123 (10.58)57 (11.42)
Chemotherapy0.706
    No686 (41.28)484 (41.62)202 (40.48)
    Yes976 (58.72)679 (58.38)297 (59.52)
Radiotherapy0.822
    No1557 (93.68)1088 (93.55)469 (93.99)
    Yes105 (6.32)75 (6.45)30 (6.01)
Bone metastasis0.172
    No1549 (93.20)1077 (92.61)472 (94.59)
    Yes113 (6.80)86 (7.39)27 (5.41)
Liver metastasis0.384
    No615 (37.00)422 (36.29)193 (38.68)
    Yes1047 (63.00)741 (63.71)306 (61.32)
Lung metastasis0.462
    No1278 (76.90)888 (76.35)390 (78.16)
    Yes384 (23.10)275 (23.65)109 (21.84)
Prognostic analysis of stage IV PC patients

Univariate Cox proportional hazards model: Among the 1163 patients in the training set, the median follow-up duration was 4 months (IQR, 1-10 months). The results of the univariate Cox proportional hazards model analysis are detailed in Table 2. The analysis indicated that age, race, marital status, tumor location, T stage, N stage, grade, surgery, chemotherapy, radiotherapy, and liver metastasis were significantly associated with OS. Similarly, all these factors except race were significantly associated with CSS, with P values of less than 0.05 for each.

Table 2 Univariate Cox regression analysis for cancer-specific survival and overall survival in stage IV pancreatic cancer patients.
Variables
OS
CSS
HR (95%CI)
P value
HR (95%CI)
P value
Age (years)
    < 601 (reference)1 (reference)
    60-751.26 (1.10, 1.45)0.0011.25 (1.08, 1.44)0.002
    > 751.70 (1.44, 2.02)< 0.0011.71 (1.44, 2.03)< 0.001
Sex
    Female1 (reference)1 (reference)
    Male0.97 (0.87, 1.09)0.6450.98 (0.87, 1.10)0.730
Race
    White1 (reference)1 (reference)
    Black1.26 (1.03, 1.54)0.0241.20 (0.98, 1.48)0.078
    Other0.88 (0.72, 1.07)0.1950.89 (0.73, 1.08)0.237
Marital status
    Married1 (reference)1 (reference)
    Unmarried1.17 (0.99, 1.38)0.0691.16 (0.98, 1.38)0.079
    Divorced1.54 (1.34, 1.77)< 0.0011.52 (1.32, 1.76)< 0.001
    Unknown1.72 (1.28, 2.33)< 0.0011.71 (1.26, 2.32)0.001
Tumor location
    Head1 (reference)1 (reference)
    Body1.34 (1.13, 1.60)0.0011.35 (1.13, 1.61)0.001
    Tail1.20 (1.02, 1.41)0.0271.22 (1.03, 1.43)0.020
    Other1.42 (1.22, 1.64)< 0.0011.42 (1.22, 1.65)< 0.001
T stage
    T1/T21 (reference)1 (reference)
    T30.76 (0.65, 0.89)< 0.0010.77 (0.66, 0.90)0.001
    T40.92 (0.77, 1.09)0.3390.93 (0.78, 1.11)0.413
    TX1.13 (0.96, 1.35)0.1481.10 (0.92, 1.31)0.288
N stage
    N01 (reference)1 (reference)
    N10.96 (0.84, 1.09)0.4880.98 (0.86, 1.12)0.789
    NX1.44 (1.22, 1.69)< 0.0011.42 (1.20, 1.68)< 0.001
Grade
    Grade I/II1 (reference)1 (reference)
    Grade III/IV1.68 (1.37, 2.05)< 0.0011.69 (1.38, 2.08)< 0.001
    Unknown1.74 (1.48, 2.04)< 0.0011.72 (1.46, 2.02)< 0.001
Surgery
    No1 (reference)1 (reference)
    Yes0.46 (0.38, 0.56)< 0.0010.47 (0.38, 0.57)< 0.001
Chemotherapy
    No1 (reference)1 (reference)
    Yes0.33 (0.30, 0.38)< 0.0010.34 (0.30, 0.38)< 0.001
Radiotherapy
    No1 (reference)1 (reference)
    Yes0.76 (0.60, 0.97)0.0270.78 (0.61, 0.99)0.040
Bone metastasis
    No1 (reference)1 (reference)
    Yes1.10 (0.88, 1.37)0.3971.15 (0.92, 1.44)0.206
Liver metastasis
    No1 (reference)1 (reference)
    Yes1.49 (1.32, 1.68)< 0.0011.50 (1.32, 1.70)< 0.001
Lung metastasis
    No1 (reference)1 (reference)
    Yes1.02 (0.89, 1.17)0.8191.01 (0.88, 1.16)0.910

Multivariate Cox proportional hazards model: Before initiating multivariate Cox proportional hazards modeling, we assessed potential multicollinearity among the variables. The results indicated that correlation coefficients for all variable pairs (except surgery and grade) were below 0.7 (Figure 1), and VIF values were all less than 5, suggesting that multicollinearity was not a concern among the independent variables. The multivariate Cox proportional hazards model analysis identified age, race, marital status, tumor location, N stage, grade, surgery, chemotherapy, and liver metastasis as independent predictors of OS (all P < 0.05); additionally, these factors were also independently associated with CSS (Figure 2).

Figure 1
Figure 1 Pearson’s correlation coefficients between variable pairs.
Figure 2
Figure 2 Forest plot for overall survival and cancer-specific survival using multivariate Cox regression analysis in patients. A: Overall survival; B: Cancer-specific survival.

LASSO regression model: To accurately and comprehensively identify independent factors affecting the prognosis of patients with stage IV PC and to minimize the impact of variable collinearity on the results, we concurrently employed LASSO regression analysis with a 10-fold cross-validation approach to further refine the variables. The results indicated that surgery, chemotherapy, and liver metastasis were associated with both OS and CSS (Figure 3).

Figure 3
Figure 3 Feature selection based on Least Absolute Shrinkage and Selection Operator regression. A: Curve of Least Absolute Shrinkage and Selection Operator (LASSO) regression coefficients with changing Log (λ) for overall survival (OS); B: Curve of 10-fold cross-validated C-index with changing Log (λ) for OS; C: Curve of LASSO regression coefficients with changing Log (λ) for cancer-specific survival (CSS); D: Curve of 10-fold cross-validated C-index with changing Log (λ) for CSS.

Random survival forest: The random survival forest is a machine learning algorithm characterized by high robustness, as it does not rely on assumptions such as the proportional hazards assumption or log-linearity. It employs two random sampling processes to mitigate issues of overfitting within the algorithm. In this study, we further evaluated the importance of variables for OS and CSS using this method. The results revealed that the top five variables ranked by their importance for OS were chemotherapy, surgery, liver metastasis, age, and grade (Figure 4A), while for CSS, the top five variables were chemotherapy, surgery, liver metastasis, grade, and age (Figure 4B).

Figure 4
Figure 4 Importance scores based on Random Survival Forest for overall survival and cancer-specific survival. A: Overall survival; B: Cancer-specific survival.
Construction of survival prediction model for stage IV PC patients

Based on the results from the multivariate Cox proportional hazards model, the LASSO regression model, and the random survival forest, combined with clinical relevance, we ultimately selected seven variables - age, tumor grade, surgical resection, chemotherapy, liver metastasis, bone metastasis, and lung metastasis - to construct a nomogram for predicting the prognosis of PC patients. This tool was used to predict 6-month, 12-month, and 18-month OS and CSS. Each variable in the figure was assigned a corresponding score, and the sum of all variable scores represented the total score (Total Points). A lower score indicated a better prognosis. The Total Points could be used to predict the OS and CSS of PC patients at different time points (Figure 5).

Figure 5
Figure 5 Nomogram predicting overall survival and cancer-specific survival for stage IV pancreatic cancer at 6, 12, and 18 months. OS: Overall survival; CSS: Cancer-specific survival.
Validation of survival prediction model for stage IV PC patients

To assess the discriminatory power of the model, the C-index and AUC values were calculated for both the training and validation sets (Table 3), and ROC curves were plotted (Figure 6). The results indicated that the model exhibited strong predictive value in both the training and validation sets. To evaluate model accuracy, internal and external validations were performed using the bootstrap method with B = 1000 resamples, and calibration curves were drawn. The validation results showed that, in both the training set (internal) and validation set (external), the calibration curves for 6-, 12-, and 18-month OS and CSS closely aligned with the ideal 45-degree reference line, suggesting good consistency between the model's predicted values and the actual observed values (Figure 7).

Figure 6
Figure 6 Receiver operating characteristic curves for 6-, 12-, and 18-month overall survival and cancer-specific survival in the training cohort and validation cohort. A: Overall survival; B: Cancer-specific survival. OS: Overall survival; CSS: Cancer-specific survival; AUC: Area under the curve.
Figure 7
Figure 7 Calibration curves for 6-, 12-, and 18-month overall survival and cancer-specific survival in the training cohort and validation cohort. A: The training cohort; B: Validation cohort. OS: Overall survival; CSS: Cancer-specific survival.
Table 3 Values of C-index and area under the curve of the nomogram in the training and the validation cohort.
Data set
OS
CSS
C-index (95%CI)
AUC
C-index (95%CI)
AUC
6-month
12-month
18-month
6-month
12-month
18-month
Training set0.7270.711-0.7430.8010.7750.7870.7270.711-0.7430.8000.7740.785
Validation set0.7190.695-0.7440.7620.7990.7850.7160.691-0.7410.7590.7970.785
Survival curves of CSS and OS stratified by nomogram-estimated risk

To further validate the clinical utility of the model in practice, we calculated each patient's Total Points based on the constructed nomogram in both the training and validation sets and used X-tile software to stratify risk within the training set. For both OS and CSS, the cutoff values for the low-risk and high-risk groups were determined to be 148 and 189.7 points, respectively. Specifically, for OS, patients with Total Points less than 148 were considered low risk, while those with Total Points greater than 148 were categorized as high risk. For CSS, patients with Total Points less than 189.7 were classified as low risk, whereas those with Total Points greater than 189.7 were classified as high risk. The results demonstrated that the model effectively differentiated the survival prognosis of stage IV PC patients in both the validation and training sets (both P < 0.0001; Figure 8).

Figure 8
Figure 8 Risk-stratified survival curves of overall survival and cancer-specific survival in the training cohort, and overall survival and cancer-specific survival in the validation cohort. A and B: Risk-stratified survival curves of overall survival (OS) (A) and cancer-specific survival (CSS) (B) in the training cohort; C and D: OS (C) and CSS (D) in the validation cohort.
DCA curve analysis of the model

DCA, a novel method for evaluating predictive models, places greater emphasis on assessing clinical benefit. In this study, we employed DCA to determine whether the model could provide benefits in clinical practice. The results revealed that, compared with the TNM staging system, the nomogram model demonstrated a greater net benefit in predicting patients' OS and CSS (Figure 9).

Figure 9
Figure 9 Comparison of decision curve analysis for the nomogram and tumor-node-metastasis staging system in predicting 6-month, 12-month, and 18-month overall survival and cancer-specific survival in both training and validation sets. A: Training set overall survival (OS); B: Training set cancer-specific survival (CSS); C: Validation set OS; D: Validation set CSS. TNM: Tumor-node-metastasis.
DISCUSSION

PC, a highly aggressive solid tumor affecting the digestive organs, is associated with a grim prognosis. At the time of initial presentation, approximately 80% of patients exhibit locally advanced disease or distant metastases, precluding them from surgical intervention[17,18]. In clinical practice, we have observed that survival outcomes for patients with stage IV PC at initial diagnosis can vary significantly. However, there are few clinical prognostic assessment tools specifically designed for this population. This study utilized the SEER database to construct a prognostic nomogram using machine learning algorithms, aiming to provide precise and personalized prognostic assessments for stage IV PC patients and offer a reference for clinical decision-making.

In this study, we integrated results from the multivariate Cox proportional hazards model, LASSO regression model, and random survival forest, along with clinical relevance, to select seven variables: Age, tumor grade, surgical resection, chemotherapy, liver metastasis, bone metastasis, and lung metastasis. These variables were used to construct a nomogram for predicting patient prognosis. The nomogram was employed to predict 6-month, 12-month, and 18-month OS and CSS. Further validation demonstrated good discrimination and accuracy in both the training and validation sets, providing valuable guidance for clinical decision-making. A study by Shi et al[19] indicated that patients aged 65 years and older had significantly poorer OS. Consistent with Shi et al[19], our findings revealed a 20% increase in HR (OS) and HR (CSS) for patients aged 60-75 years compared to those under 60 years, with respective increases of 30% and 32% for patients over 75 years. Tumor grade was also identified as an independent risk factor for prognosis[20,21]. Shrikhande et al[22] reported that among 129 pancreatic ductal adenocarcinoma patients with liver metastases, only 11 who underwent surgery had a median OS of 11.4 months, compared to 5.9 months for those without surgery. Multiple studies have confirmed that patients with liver metastasis who undergo surgery experience better outcomes[2,23-26]. Our study reached a similar conclusion. Evidence indicates that over half of PC patients present with distant organ metastasis at diagnosis[2], and distant metastasis is an independent risk factor for prognosis[27]. Improving the survival prognosis of these patients remains a critical clinical challenge. Currently, the TNM staging system is widely used to predict prognosis in clinical practice; however, it has limitations, such as disregarding other factors affecting survival and lacking the ability to visualize individual survival outcomes[19].

Nomograms are extensively utilized in predicting tumor prognosis, assisting clinicians in devising individualized treatment plans[3,28-31]. In this study, we developed a predictive model based on independent prognostic factors for stage IV PC, which exhibited strong accuracy and consistency. This model can assist clinicians in making optimal clinical decisions. For example, a 65-year-old patient with grade III PC and liver metastases - but without bone or lung metastases, who had not undergone surgery or chemotherapy - had a model total score for OS of 486, corresponding to estimated cumulative survival probabilities of 37.6%, 14.6%, and 6.16% at 6, 12, and 18 months, respectively; for CSS, the total score was 482, with estimated survival probabilities of 38.7%, 15.6%, and 6.69% at the same time points. This model allows for a personalized assessment of patient prognosis.

Moreover, this tool can aid clinicians in making medical decisions. Using the aforementioned patient as an example, we utilized the model to estimate OS and CSS with and without surgery and chemotherapy. This provides personalized, intuitive, and comprehensible objective parameters for clinical decision-making. Additionally, risk stratification based on the model's total score may guide postoperative follow-up strategies. Given the complex factors influencing OS, CSS is often considered more critical[32]. Consequently, we stratified patient risk based on CSS. With a total score of 482, exceeding the cutoff value of 189.7, this patient was identified as high-risk, necessitating more vigilant monitoring and a stringent treatment and follow-up plan. To our knowledge, there are no large-scale real-world studies on prognostic models for stage IV PC in the literature. This study has several strengths. First, we employed the random survival forest model, a robust machine learning algorithm within ensemble learning that is not constrained by assumptions such as proportional hazards or log-linearity, and mitigates overfitting through dual random sampling. Consequently, this algorithm yields more reliable and stable conclusions. Second, we independently applied LASSO and classical univariate and multivariate Cox proportional hazards models, while quantitatively assessing multicollinearity in the models using VIFs and pairwise Pearson correlation coefficients. Finally, this study included a total of 1662 patients, of whom 1628 (97.95%) experienced mortality events, and 1560 (93.86%) had cancer-specific deaths, providing ample statistical power for analysis.

However, this study has inherent limitations. As a retrospective study, despite stringent inclusion and exclusion criteria, selection bias is inevitable. Additionally, the SEER database, being a large-scale cancer registry, is subject to coding errors and missing values. Furthermore, due to data access restrictions, certain information (such as specific treatment protocols for chemotherapy, surgery, and radiation therapy, as well as data on tumor recurrence) is unavailable, potentially impacting the results. Lastly, the sample size within some categories is small and unevenly distributed, which may affect statistical power.

In conclusion, age, tumor grade, surgical resection, chemotherapy, and metastases to the liver, bone, and lung were identified as independent prognostic factors. The model developed from these factors offers a practical reference for clinical use.

CONCLUSION

This study developed a machine learning-based nomogram that effectively predicts the survival outcomes of stage IV PC patients, utilizing independent prognostic factors such as age, tumor grade, chemotherapy, surgery, and liver, bone, and lung metastasis. The model demonstrates strong accuracy in predicting CSS and OS at 6, 12, and 18 months.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Oncology

Country of origin: China

Peer-review report’s classification

Scientific Quality: Grade B, Grade B

Novelty: Grade B, Grade B

Creativity or Innovation: Grade B, Grade B

Scientific Significance: Grade B, Grade B

P-Reviewer: Chisthi MM; Liu YQ S-Editor: Li L L-Editor: A P-Editor: Xu ZH

References
1.  Dalmartello M, La Vecchia C, Bertuccio P, Boffetta P, Levi F, Negri E, Malvezzi M. European cancer mortality predictions for the year 2022 with focus on ovarian cancer. Ann Oncol. 2022;33:330-339.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 13]  [Cited by in RCA: 80]  [Article Influence: 26.7]  [Reference Citation Analysis (0)]
2.  Wang L, Yang L, Chen L, Chen Z. Do Patients Diagnosed with Metastatic Pancreatic Cancer Benefit from Primary Tumor Surgery? A Propensity-Adjusted, Population-Based Surveillance, Epidemiology and End Results (SEER) Analysis. Med Sci Monit. 2019;25:8230-8241.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 5]  [Cited by in RCA: 5]  [Article Influence: 0.8]  [Reference Citation Analysis (0)]
3.  Chen MS, Liu PC, Yi JZ, Xu L, He T, Wu H, Yang JQ, Lv Q. Development and validation of nomograms for predicting survival in patients with de novo metastatic triple-negative breast cancer. Sci Rep. 2022;12:14659.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
4.  Huang C, Yu QP, Li H, Ding Z, Zhou Z, Shi X. A novel nomogram model to predict the overall survival of patients with retroperitoneal leiomyosarcoma: a large cohort retrospective study. Sci Rep. 2022;12:11851.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
5.  Zhang SL, Wang ZM, Wang WR, Wang X, Zhou YH. Novel nomograms individually predict the survival of patients with soft tissue sarcomas after surgery. Cancer Manag Res. 2019;11:3215-3225.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 8]  [Cited by in RCA: 8]  [Article Influence: 1.3]  [Reference Citation Analysis (0)]
6.  Wang W, Hong J, Meng J, Wu H, Shi M, Yan S, Huang Y. Nomograms Predict Cancer-Specific and Overall Survival of Patients With Primary Limb Leiomyosarcoma. J Orthop Res. 2019;37:1649-1657.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6]  [Cited by in RCA: 6]  [Article Influence: 1.0]  [Reference Citation Analysis (0)]
7.  Tian H, Ning Z, Zong Z, Liu J, Hu C, Ying H, Li H. Application of Machine Learning Algorithms to Predict Lymph Node Metastasis in Early Gastric Cancer. Front Med (Lausanne). 2021;8:759013.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 19]  [Cited by in RCA: 13]  [Article Influence: 4.3]  [Reference Citation Analysis (0)]
8.  Sun D, Peng H, Wu Z. Establishment and Analysis of a Combined Diagnostic Model of Alzheimer's Disease With Random Forest and Artificial Neural Network. Front Aging Neurosci. 2022;14:921906.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 1]  [Cited by in RCA: 15]  [Article Influence: 5.0]  [Reference Citation Analysis (0)]
9.  Liu W, Zhang L, Xin Z, Zhang H, You L, Bai L, Zhou J, Ying B. A Promising Preoperative Prediction Model for Microvascular Invasion in Hepatocellular Carcinoma Based on an Extreme Gradient Boosting Algorithm. Front Oncol. 2022;12:852736.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2]  [Cited by in RCA: 9]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
10.  Huang K, Yuan X, Zhao P, He Y. Effect of chemotherapy on prognosis in patients with primary pancreatic signet ring cell carcinoma: A large real-world study based on machine learning. PLoS One. 2024;19:e0302685.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
11.  Kim Y, Margonis GA, Prescott JD, Tran TB, Postlewait LM, Maithel SK, Wang TS, Evans DB, Hatzaras I, Shenoy R, Phay JE, Keplinger K, Fields RC, Jin LX, Weber SM, Salem AI, Sicklick JK, Gad S, Yopp AC, Mansour JC, Duh QY, Seiser N, Solorzano CC, Kiernan CM, Votanopoulos KI, Levine EA, Poultsides GA, Pawlik TM. Nomograms to Predict Recurrence-Free and Overall Survival After Curative Resection of Adrenocortical Carcinoma. JAMA Surg. 2016;151:365-373.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 58]  [Cited by in RCA: 90]  [Article Influence: 10.0]  [Reference Citation Analysis (0)]
12.  Katsuyama E, Miyawaki Y, Sada KE, Asano Y, Hayashi K, Yamamura Y, Hiramatsu-Asano S, Morishita M, Ohashi K, Watanabe H, Katsuyama T, Narazaki M, Matsumoto Y, Wada J. Association of explanatory histological findings and urinary protein and serum creatinine levels at renal biopsy in lupus nephritis: a cross-sectional study. BMC Nephrol. 2020;21:208.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 4]  [Cited by in RCA: 4]  [Article Influence: 0.8]  [Reference Citation Analysis (0)]
13.  Kalantari S, Khalili D, Asgari S, Fahimfar N, Hadaegh F, Tohidi M, Azizi F. Predictors of early adulthood hypertension during adolescence: a population-based cohort study. BMC Public Health. 2017;17:915.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 23]  [Cited by in RCA: 33]  [Article Influence: 4.1]  [Reference Citation Analysis (0)]
14.  Kim JH. Multicollinearity and misleading statistical results. Korean J Anesthesiol. 2019;72:558-569.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 298]  [Cited by in RCA: 801]  [Article Influence: 133.5]  [Reference Citation Analysis (0)]
15.  Mo X, Zhou M, Yan H, Chen X, Wang Y. Competing risk analysis of cardiovascular/cerebrovascular death in T1/2 kidney cancer: a SEER database analysis. BMC Cancer. 2021;21:13.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 4]  [Reference Citation Analysis (0)]
16.  Liao Y, Yin G, Fan X. The Positive Lymph Node Ratio Predicts Survival in T(1-4)N(1-3)M(0) Non-Small Cell Lung Cancer: A Nomogram Using the SEER Database. Front Oncol. 2020;10:1356.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 16]  [Cited by in RCA: 34]  [Article Influence: 6.8]  [Reference Citation Analysis (0)]
17.  Chen H, Kong Y, Yao Q, Zhang X, Fu Y, Li J, Liu C, Wang Z. Three hypomethylated genes were associated with poor overall survival in pancreatic cancer patients. Aging (Albany NY). 2019;11:885-897.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 25]  [Cited by in RCA: 28]  [Article Influence: 5.6]  [Reference Citation Analysis (0)]
18.  Li D, Xie K, Wolff R, Abbruzzese JL. Pancreatic cancer. Lancet. 2004;363:1049-1057.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1481]  [Cited by in RCA: 1537]  [Article Influence: 73.2]  [Reference Citation Analysis (0)]
19.  Shi H, Chen Z, Dong S, He R, Du Y, Qin Z, Zhou W. A nomogram for predicting survival in patients with advanced (stage III/IV) pancreatic body tail cancer: a SEER-based study. BMC Gastroenterol. 2022;22:279.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 8]  [Cited by in RCA: 9]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
20.  Yao Q, Jia W, Chen S, Wang Q, Liu Z, Liu D, Ji X. Machine learning was used to predict risk factors for distant metastasis of pancreatic cancer and prognosis analysis. J Cancer Res Clin Oncol. 2023;149:10279-10291.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Reference Citation Analysis (0)]
21.  Zhang W, Ji L, Wang X, Zhu S, Luo J, Zhang Y, Tong Y, Feng F, Kang Y, Bi Q. Nomogram Predicts Risk and Prognostic Factors for Bone Metastasis of Pancreatic Cancer: A Population-Based Analysis. Front Endocrinol (Lausanne). 2021;12:752176.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 12]  [Cited by in RCA: 51]  [Article Influence: 12.8]  [Reference Citation Analysis (0)]
22.  Shrikhande SV, Kleeff J, Reiser C, Weitz J, Hinz U, Esposito I, Schmidt J, Friess H, Büchler MW. Pancreatic resection for M1 pancreatic ductal adenocarcinoma. Ann Surg Oncol. 2007;14:118-127.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 153]  [Cited by in RCA: 159]  [Article Influence: 8.4]  [Reference Citation Analysis (0)]
23.  Strobel O, Neoptolemos J, Jäger D, Büchler MW. Optimizing the outcomes of pancreatic cancer surgery. Nat Rev Clin Oncol. 2019;16:11-26.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 304]  [Cited by in RCA: 568]  [Article Influence: 94.7]  [Reference Citation Analysis (0)]
24.  Timmer FEF, Geboers B, Nieuwenhuizen S, Schouten EAC, Dijkstra M, de Vries JJJ, van den Tol MP, Meijerink MR, Scheffer HJ. Locoregional Treatment of Metastatic Pancreatic Cancer Utilizing Resection, Ablation and Embolization: A Systematic Review. Cancers (Basel). 2021;13:1608.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 6]  [Cited by in RCA: 6]  [Article Influence: 1.5]  [Reference Citation Analysis (1)]
25.  Tsitskari M, Filippiadis D, Kostantos C, Palialexis K, Zavridis P, Kelekis N, Brountzos E. The role of interventional oncology in the treatment of colorectal cancer liver metastases. Ann Gastroenterol. 2019;32:147-155.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 16]  [Reference Citation Analysis (0)]
26.  Chow FC, Chok KS. Colorectal liver metastases: An update on multidisciplinary approach. World J Hepatol. 2019;11:150-172.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 173]  [Cited by in RCA: 142]  [Article Influence: 23.7]  [Reference Citation Analysis (3)]
27.  Han R, Tian Z, Jiang Y, Guan G, Wang X, Sun X, Yu Y, Jing X. Prognostic significance of the systemic immune inflammation index in patients with metastatic and unresectable pancreatic cancer. Front Surg. 2022;9:915599.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
28.  Zhang H, Dong H, Pan Z, Du X, Liu S, Xu W, Zhang Y. Risk factors and predictive nomograms for early death of patients with pancreatic cancer liver metastasis: A large cohort study based on the SEER database and Chinese population. Front Oncol. 2022;12:998445.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Reference Citation Analysis (0)]
29.  Zhang JX, Song W, Chen ZH, Wei JH, Liao YJ, Lei J, Hu M, Chen GZ, Liao B, Lu J, Zhao HW, Chen W, He YL, Wang HY, Xie D, Luo JH. Prognostic and predictive value of a microRNA signature in stage II colon cancer: a microRNA expression analysis. Lancet Oncol. 2013;14:1295-1306.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 386]  [Cited by in RCA: 443]  [Article Influence: 36.9]  [Reference Citation Analysis (0)]
30.  Huang YQ, Liang CH, He L, Tian J, Liang CS, Chen X, Ma ZL, Liu ZY. Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. J Clin Oncol. 2016;34:2157-2164.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 896]  [Cited by in RCA: 1290]  [Article Influence: 143.3]  [Reference Citation Analysis (0)]
31.  Zhou G, Xiao K, Gong G, Wu J, Zhang Y, Liu X, Jiang Z, Ma C. A novel nomogram for predicting liver metastasis in patients with gastrointestinal stromal tumor: a SEER-based study. BMC Surg. 2020;20:298.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 4]  [Cited by in RCA: 10]  [Article Influence: 2.0]  [Reference Citation Analysis (0)]
32.  Xu L, Wen N, Qiu J, He T, Tan Q, Yang J, Du Z, Lv Q. Predicting Survival Benefit of Sparing Sentinel Lymph Node Biopsy in Low-Risk Elderly Patients With Early Breast Cancer: A Population-Based Analysis. Front Oncol. 2020;10:1718.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 6]  [Cited by in RCA: 7]  [Article Influence: 1.4]  [Reference Citation Analysis (0)]