1
|
Palumbo P. Qini Curves for Potential Impact Assessment of Risk Predictive Models Informing Intervention Policies. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2025:S1098-3015(25)00066-X. [PMID: 39954856 DOI: 10.1016/j.jval.2025.01.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 01/10/2025] [Accepted: 01/28/2025] [Indexed: 02/17/2025]
Abstract
OBJECTIVES Predictive models in medicine help make decisions about which individual to treat with a given therapeutic or preventive intervention. Before being tested in large field studies and recommended for clinical adoption, it is important to evaluate not only their statistical accuracy but also the impact they may have when used to inform health intervention policies. We aim to provide simple methods for the potential impact assessment of health intervention policies based on predictive models. METHODS We propose an analytic framework based on Qini curves wherein prediction-based policies are analyzed on 2 impact endpoints: (1) the fraction of the population that would be selected for the intervention (coverage) and (2) the effect on the clinical outcomes of interest (disutility). The drivers of values are the disease prevalence, the predictive performance of the model, and the effectiveness of the intervention. RESULTS We present simple formulas for calculating coverage and disutility from either observational or randomized controlled data. We illustrate possible value measures arising from geometrical properties on the Qini plane: delta coverage and disutility, number needed to treat, and integrated difference between Qini curves. We show the applicability of the Qini analysis by providing examples about the prevention of falls in older adults and prevention of secondary cardiovascular events with pioglitazone. CONCLUSIONS Coverage and disutility capture key value components of prediction-based policies. The method can be used for comparing models or tuning risk thresholds for managing trade-offs between conflicting objectives (eg, clinical benefits, side effects, and healthcare resources).
Collapse
Affiliation(s)
- Pierpaolo Palumbo
- Department of Electrical, Electronic, and Information Engineering "Guglielmo Marconi"-DEI, University of Bologna, Bologna, Italy.
| |
Collapse
|
2
|
Hegarty SE, Linn KA, Zhang H, Teeple S, Albert PS, Parikh RB, Courtright K, Kent DM, Chen J. Assessing Algorithm Fairness Requires Adjustment for Risk Distribution Differences: Re-Considering the Equal Opportunity Criterion. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.01.31.25321489. [PMID: 39974139 PMCID: PMC11838655 DOI: 10.1101/2025.01.31.25321489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
The proliferation of algorithm-assisted decision making has prompted calls for careful assessment of algorithm fairness. One popular fairness metric, equal opportunity, demands parity in true positive rates (TPRs) across different population subgroups. However, we highlight a critical but overlooked weakness in this measure: at a given decision threshold, TPRs vary when the underlying risk distribution varies across subgroups, even if the model equally captures the underlying risks. Failure to account for variations in risk distributions may lead to misleading conclusions on performance disparity. To address this issue, we introduce a novel metric called adjusted TPR (aTPR), which modifies subgroup-specific TPRs to reflect performance relative to the risk distribution in a common reference subgroup. Evaluating fairness using aTPRs promotes equal treatment for equal risk by reflecting whether individuals with similar underlying risks have similar opportunities of being identified as high risk by the model, regardless of subgroup membership. We demonstrate our method through numerical experiments that explore a range of differential calibration relationships and in a real-world data set that predicts 6-month mortality risk in an in-patient sample in order to increase timely referrals for palliative care consultations.
Collapse
Affiliation(s)
- Sarah E Hegarty
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristin A Linn
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Hong Zhang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, P.R. China
| | - Stephanie Teeple
- Palliative and Advanced Illness Research (PAIR) Center, University of Pennsylvania, Philadelphia, PA, USA
| | - Paul S Albert
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Ravi B Parikh
- School of Medicine, Emory University, Atlanta, GA, USA
| | | | - David M Kent
- School of Medicine, Tufts University, Philadelphia, PA, USA
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
3
|
Nguyen NT, Pennello GA. DxGoals: A Software Tool for Determining and Analyzing Clinically Meaningful Classification Accuracy Goals for Diagnostic Tests. J Appl Lab Med 2024; 9:952-962. [PMID: 39225456 DOI: 10.1093/jalm/jfae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 05/01/2024] [Indexed: 09/04/2024]
Abstract
BACKGROUND To evaluate diagnostic tests for low prevalence conditions, classification accuracy metrics such as sensitivity, specificity, and positive likelihood ratio (PLR) and negative likelihood ratio (NLR) are advantageous because they are prevalence-independent and thus estimable in studies enriched for the condition. However, classification accuracy goals are often chosen without a clear understanding of whether they are clinically meaningful. Pennello (2021) proposed a risk stratification framework for determining classification accuracy goals. A software application is needed to determine the goals and provide data analysis. METHODS We introduce DxGoals, a freely available, R-Shiny software application for determining, visualizing, and analyzing classification accuracy goals for diagnostic tests. Given prevalence p for the target condition and specification that a test's positive and negative predictive values PPVand NPV=1-cNPV should satisfy PPV>PPV* and cNPV RESULTS We illustrate DxGoals on tests for penicillin allergy, ovarian cancer, and cervical cancer. The inputs cNPV*,p, and PPV* were informed by clinical management guidelines. CONCLUSIONS DxGoals facilitates determination, visualization, and analysis of clinically meaningful standalone and comparative classification accuracy goals. It is a potentially useful tool for diagnostic test evaluation.
Collapse
Affiliation(s)
- Ngoc-Ty Nguyen
- U.S. Food and Drug Administration, Center for Biologics Evaluation and Research, Silver Spring, MD, United States
| | - Gene A Pennello
- U.S. Food and Drug Administration, Center for Devices and Radiological Health, Silver Spring, MD, United States
| |
Collapse
|
4
|
Takegami N, Torres-Espin A, Imagawa Y, Watanabe I, Rowell S, Schreiber M, Ferguson AR, Hinson HE. Evaluating and Updating the IMPACT Model to Predict Outcomes in Two Contemporary North American Traumatic Brain Injury Cohorts. J Neurotrauma 2024. [PMID: 38984940 DOI: 10.1089/neu.2024.0158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/11/2024] Open
Abstract
The International Mission on Prognosis and Analysis of Clinical Trials in Traumatic Brain Injury (IMPACT) model is a widely recognized prognostic model applied after traumatic brain injury (TBI). However, it was developed with patient cohorts that may not reflect modern practice patterns in North America. We analyzed data from two sources: the placebo arm of the phase II double-blinded, multicenter, randomized controlled trial Prehospital Tranexamic Acid for TBI (TXA) cohort and an observational cohort with similar inclusion/exclusion criteria (Predictors of Low-risk Phenotypes after Traumatic Brain Injury Incorporating Proteomic Biomarker Signatures [PROTIPS] cohort). All three versions of the IMPACT model-core, extended, and laboratory-were evaluated for 6-month mortality (Glasgow Outcome Scale Extended [GOSE] = 1) and unfavorable outcomes (GOSE = 1-4). Calibration (intercept and slope) and discrimination (area under the receiver operating characteristic curve [ROC-AUC]) were used to assess model performance. We then compared three model updating methods-recalibration in the large, logistic recalibration, and coefficient update-with the best update method determined by likelihood ratio tests. In our calibration analysis, recalibration improved both intercepts and slopes, indicating more accurate predicted probabilities when recalibration was done. Discriminative performance of the IMPACT models, measured by AUC, showed mortality prediction ROCs between 0.61 and 0.82 for the TXA cohort, with the coefficient updated Lab model achieving the highest at 0.84. Unfavorable outcomes had lower AUCs, ranging from 0.60 to 0.79. Similarly, in the PROTIPS cohort, AUCs for mortality ranged from 0.75 to 0.82, with the coefficient updated Lab model also showing superior performance (AUC 0.84). Unfavorable outcomes in this cohort presented AUCs from 0.67 to 0.73, consistently lower than mortality predictions. The closed testing procedure using likelihood ratio tests consistently identified the coefficient update model as superior, outperforming the original and recalibrated models across all cohorts. In our comprehensive evaluation of the IMPACT model, the coefficient updated models were the best performing across all cohorts through a structured closed testing procedure. Thus, standardization of model updating procedures is needed to reproducibly determine the best performing versions of IMPACT that reflect the specific characteristics of a dataset.
Collapse
Affiliation(s)
- Naoki Takegami
- Department of Neurological Surgery, University of California, San Francisco, California, USA
- Weill Institute of Neurosciences, Brain and Spinal Injury Center (BASIC), University of California, San Francisco, California, USA
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Abel Torres-Espin
- School of Public Health Sciences, University of Waterloo, Waterloo, Canada
| | - Yoshihito Imagawa
- Department of Chemistry and Biomolecular Science, Biomolecular Science Course, Faculty of Engineering, Gifu University, Gifu, Japan
| | - Itsunori Watanabe
- Department of Computer Science and Engineering, School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan
| | - Susan Rowell
- Department of Surgery, Oregon Health & Science University, Portland, Oregon, USA
| | - Martin Schreiber
- Donald D. Trunkey Center for Civilian and Combat Casualty Care, Oregon Health & Science University, Portland, Oregon, USA
| | - Adam R Ferguson
- Department of Neurological Surgery, University of California, San Francisco, California, USA
- Weill Institute of Neurosciences, Brain and Spinal Injury Center (BASIC), University of California, San Francisco, California, USA
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco, California, USA
- San Francisco Veterans Affairs Healthcare System, San Francisco, California, USA
| | - H E Hinson
- Department of Neurology, University of California, San Francisco, California, USA
| |
Collapse
|
5
|
Pavlou M, Ambler G, Qu C, Seaman SR, White IR, Omar RZ. An evaluation of sample size requirements for developing risk prediction models with binary outcomes. BMC Med Res Methodol 2024; 24:146. [PMID: 38987715 PMCID: PMC11234534 DOI: 10.1186/s12874-024-02268-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 06/24/2024] [Indexed: 07/12/2024] Open
Abstract
BACKGROUND Risk prediction models are routinely used to assist in clinical decision making. A small sample size for model development can compromise model performance when the model is applied to new patients. For binary outcomes, the calibration slope (CS) and the mean absolute prediction error (MAPE) are two key measures on which sample size calculations for the development of risk models have been based. CS quantifies the degree of model overfitting while MAPE assesses the accuracy of individual predictions. METHODS Recently, two formulae were proposed to calculate the sample size required, given anticipated features of the development data such as the outcome prevalence and c-statistic, to ensure that the expectation of the CS and MAPE (over repeated samples) in models fitted using MLE will meet prespecified target values. In this article, we use a simulation study to evaluate the performance of these formulae. RESULTS We found that both formulae work reasonably well when the anticipated model strength is not too high (c-statistic < 0.8), regardless of the outcome prevalence. However, for higher model strengths the CS formula underestimates the sample size substantially. For example, for c-statistic = 0.85 and 0.9, the sample size needed to be increased by at least 50% and 100%, respectively, to meet the target expected CS. On the other hand, the MAPE formula tends to overestimate the sample size for high model strengths. These conclusions were more pronounced for higher prevalence than for lower prevalence. Similar results were drawn when the outcome was time to event with censoring. Given these findings, we propose a simulation-based approach, implemented in the new R package 'samplesizedev', to correctly estimate the sample size even for high model strengths. The software can also calculate the variability in CS and MAPE, thus allowing for assessment of model stability. CONCLUSIONS The calibration and MAPE formulae suggest sample sizes that are generally appropriate for use when the model strength is not too high. However, they tend to be biased for higher model strengths, which are not uncommon in clinical risk prediction studies. On those occasions, our proposed adjustments to the sample size calculations will be relevant.
Collapse
Affiliation(s)
| | | | - Chen Qu
- Department of Statistical Science, UCL, London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | | | | |
Collapse
|
6
|
Wolf S, Zechmeister-Koss I, Fruehwirth I. The Prognostic Quality of Risk Prediction Models to Assess the Individual Breast Cancer Risk in Women: An Overview of Reviews. Breast J 2024; 2024:1711696. [PMID: 39742377 PMCID: PMC10978083 DOI: 10.1155/2024/1711696] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 03/01/2024] [Accepted: 03/02/2024] [Indexed: 01/03/2025]
Abstract
Purpose Breast cancer is the most common cancer among women globally, with an incidence of approximately two million cases in 2018. Organised age-based breast cancer screening programs were established worldwide to detect breast cancer earlier and to reduce mortality. Currently, there is substantial anticipation regarding risk-adjusted screening programs, considering various risk factors in addition to age. The present study investigated the discriminatory accuracy of breast cancer risk prediction models and whether they suit risk-based screening programs. Methods Following the PICO scheme, we conducted an overview of reviews and systematically searched four databases. All methodological steps, including the literature selection, data extraction and synthesis, and the quality appraisal were conducted following the 4-eyes principle. For the quality assessment, the AMSTAR 2 tool was used. Results We included eight systematic reviews out of 833 hits based on the prespecified inclusion criteria. The eight systematic reviews comprised ninety-nine primary studies that were also considered for the data analysis. Three systematic reviews were assessed as having a high risk of bias, while the others were rated with a moderate or low risk of bias. Most identified breast cancer risk prediction models showed a low prognostic quality. Adding breast density and genetic information as risk factors only moderately improved the models' discriminatory accuracy. Conclusion All breast cancer risk prediction models published to date show a limited ability to predict the individual breast cancer risk in women. Hence, it is too early to implement them in national breast cancer screening programs. Relevant randomised controlled trials about the benefit-harm ratio of risk-adjusted breast cancer screening programs compared to conventional age-based programs need to be awaited.
Collapse
Affiliation(s)
- Sarah Wolf
- HTA Austria-Austrian Institute for Health Technology Assessment (AIHTA) GmbH, Garnisongasse 7/21, Vienna 1090, Austria
| | - Ingrid Zechmeister-Koss
- HTA Austria-Austrian Institute for Health Technology Assessment (AIHTA) GmbH, Garnisongasse 7/21, Vienna 1090, Austria
| | - Irmgard Fruehwirth
- HTA Austria-Austrian Institute for Health Technology Assessment (AIHTA) GmbH, Garnisongasse 7/21, Vienna 1090, Austria
| |
Collapse
|
7
|
Samawi H, Ahmed F, Pennello G, Yin J. Net benefit of diagnostic tests for multistate diseases: an indicator variables approach. J Biopharm Stat 2023; 33:611-638. [PMID: 36710380 DOI: 10.1080/10543406.2023.2169928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 12/30/2022] [Indexed: 01/31/2023]
Abstract
A limitation of the common measures of diagnostic test performance, such as sensitivity and specificity, is that they do not consider the relative importance of false negative and false positive test results, which are likely to have different clinical consequences. Therefore, the use of classification or prediction measures alone to compare diagnostic tests or biomarkers can be inconclusive for clinicians. Comparing tests on net benefit can be more conclusive because clinical consequences of misdiagnoses are considered. The literature suggested evaluating the binary diagnostic tests based on net benefit, but did not consider diagnostic tests that classify more than two disease states, e.g., stroke subtype (large-artery atherosclerosis, cardioembolism, small-vessel occlusion, stroke of other determined etiology, stroke of undetermined etiology), skin lesion subtype, breast cancer subtypes (benign, mass, calcification, architectural distortion, etc.), METAVIR liver fibrosis state (F0- F4), histopathological classification of cervical intraepithelial neoplasia (CIN), prostate Gleason grade, brain injury (intracranial hemorrhage, mass effect, midline shift, cranial fracture) . Other diseases have more than two stages, such as Alzheimer's disease (dementia due to Alzheimer's disease, mild cognitive disability (MCI) due to Alzheimer's disease, and preclinical presymptomatics due to Alzheimer's disease). In diseases with more than two states, the benefits and risks may vary between states. This paper extends the net-benefit approach of evaluating binary diagnostic tests to multi-state clinical conditions to rule-in or rule-out a clinical condition based on adverse consequences of work-up delay (due to false negative test result) and unnecessary workup (due to false positive test result). We demonstrate our approach with numerical examples and real data.
Collapse
Affiliation(s)
- Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Health Sciences Jiann-Ping Hsu College of Public Health Georgia Southern University, Statesboro, GA, USA
| | - Ferdous Ahmed
- Department of Biostatistics, Epidemiology and Environmental Health Sciences Jiann-Ping Hsu College of Public Health Georgia Southern University, Statesboro, GA, USA
| | - Gene Pennello
- Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, USA
| | - Jingjing Yin
- Department of Biostatistics, Epidemiology and Environmental Health Sciences Jiann-Ping Hsu College of Public Health Georgia Southern University, Statesboro, GA, USA
| |
Collapse
|
8
|
Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023; 21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.
Collapse
Affiliation(s)
| | | | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Federico Ambrogi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Scientific Directorate, IRCCS Policlinico San Donato, San Donato Milanese, Italy
| | - Lara Lusa
- Department of Mathematics, Faculty of Mathematics, Natural Sciences and Information Technology, University of Primorksa, Koper, Slovenia
- Institute of Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany
| | | | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Stefan Michiels
- Service de Biostatistique et d'Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Labeled Ligue Contre le Cancer, Villejuif, France
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lisa McShane
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|
9
|
Helmus LM, Kyne A. Prevalence, Correlates, and Sequelae of Child Sexual Abuse (CSA) among Indigenous Canadians: Intersections of Ethnicity, Gender, and Socioeconomic Status. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:ijerph20095727. [PMID: 37174245 PMCID: PMC10178094 DOI: 10.3390/ijerph20095727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 04/11/2023] [Accepted: 04/25/2023] [Indexed: 05/15/2023]
Abstract
Child sexual abuse (CSA) is a severe and concerning public-health problem globally, but some children are at higher risk of experiencing it. The harms caused by colonization and particularly the inter-generational legacy of residential schools would presumably increase the vulnerability of Indigenous children in former British colonies. Among 282 Indigenous participants in Canada recruited from Prime Panels, CSA was reported by 35% of boys, 50% of girls, and 57% of trans and gender non-conforming participants. These rates are substantially higher than global meta-analytic estimates (7.6% of boys and 18.0% of girls). There was evidence of intersectionality based on socioeconomic status. CSA was associated with a variety of other indicators of negative childhood experiences and significantly predicted numerous negative outcomes in adulthood, including mental-health issues (e.g., PTSD), unemployment, and criminal legal-system involvement. Sexual abuse of Indigenous Canadian children is a public-health crisis, and layers of marginalization (e.g., gender, social class) exacerbate this risk. Trauma-informed services to address the harms of colonization are severely needed, in line with recommendations from Canada's Truth and Reconciliation Commission.
Collapse
Affiliation(s)
- L Maaike Helmus
- Department of Criminology, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Ashley Kyne
- Department of Criminology, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
10
|
Vishwakarma GK, Bhattacharjee A, Tank F, Pashchenko AF. Subgroup identification of targeted therapy effects on biomarker for time to event data. Cancer Biomark 2023; 38:413-424. [PMID: 37980650 DOI: 10.3233/cbm-230181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2023]
Abstract
BACKGROUND The initiation biomarker-driven trials have revolutionized oncology drug development by challenging the traditional phased approach and introducing basket studies. Notable successes in non-small cell lung cancer (NSCLC) with ALK, ALK/ROS1, and EGFR inhibitors have prompted the need to expand this approach to other cancer sites. OBJECTIVES This study explores the use of dose response modeling and time-to-event algorithms on the biomarker molecular targeted agent (MTA). By simulating subgroup identification in MTA-related time-to-event data, the study aims to develop statistical methodology supporting biomarker-driven trials in oncology. METHODS A total of n patients are selected assigned for different doses. A dataset is prepared to mimic the situation on Subgroup Identification of MTA for time to event data analysis. The response is measured through MTA. The MTA value is also measured through ROC. The Markov Chain Monte Carlo (MCMC) techniques are prepared to perform the proposed algorithm. The analysis is carried out with a simulation study. The subset selection is performed through the Threshold Limit Value (TLV) by the Bayesian approach. RESULTS The MTA is observed with range 12-16. It is expected that there is a marginal level shift of the MTA from pre to post-treatment. The Cox time-varying model can be adopted further as causal-effect relation to establishing the MTA on prolonging the survival duration. The proposed work in the statistical methodology to support the biomarker-driven trial for oncology research. CONCLUSION This study extends the application of biomarker-driven trials beyond NSCLC, opening possibilities for implementation in other cancer sites. By demonstrating the feasibility and efficacy of utilizing MTA as a biomarker, the research lays the foundation for refining and validating biomarker use in clinical trials. These advancements aim to enhance the precision and effectiveness of cancer treatments, ultimately benefiting patients.
Collapse
Affiliation(s)
| | | | | | - Alexander F Pashchenko
- Laboratory of Intellectual Control Systems and Simulation, V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
11
|
Binuya MAE, Engelhardt EG, Schats W, Schmidt MK, Steyerberg EW. Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review. BMC Med Res Methodol 2022; 22:316. [PMID: 36510134 PMCID: PMC9742671 DOI: 10.1186/s12874-022-01801-8] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 11/22/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Clinical prediction models are often not evaluated properly in specific settings or updated, for instance, with information from new markers. These key steps are needed such that models are fit for purpose and remain relevant in the long-term. We aimed to present an overview of methodological guidance for the evaluation (i.e., validation and impact assessment) and updating of clinical prediction models. METHODS We systematically searched nine databases from January 2000 to January 2022 for articles in English with methodological recommendations for the post-derivation stages of interest. Qualitative analysis was used to summarize the 70 selected guidance papers. RESULTS Key aspects for validation are the assessment of statistical performance using measures for discrimination (e.g., C-statistic) and calibration (e.g., calibration-in-the-large and calibration slope). For assessing impact or usefulness in clinical decision-making, recent papers advise using decision-analytic measures (e.g., the Net Benefit) over simplistic classification measures that ignore clinical consequences (e.g., accuracy, overall Net Reclassification Index). Commonly recommended methods for model updating are recalibration (i.e., adjustment of intercept or baseline hazard and/or slope), revision (i.e., re-estimation of individual predictor effects), and extension (i.e., addition of new markers). Additional methodological guidance is needed for newer types of updating (e.g., meta-model and dynamic updating) and machine learning-based models. CONCLUSION Substantial guidance was found for model evaluation and more conventional updating of regression-based models. An important development in model evaluation is the introduction of a decision-analytic framework for assessing clinical usefulness. Consensus is emerging on methods for model updating.
Collapse
Affiliation(s)
- M. A. E. Binuya
- grid.430814.a0000 0001 0674 1393Division of Molecular Pathology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands ,grid.10419.3d0000000089452978Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands ,grid.10419.3d0000000089452978Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - E. G. Engelhardt
- grid.430814.a0000 0001 0674 1393Division of Molecular Pathology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands ,grid.430814.a0000 0001 0674 1393Division of Psychosocial Research and Epidemiology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
| | - W. Schats
- grid.430814.a0000 0001 0674 1393Scientific Information Service, The Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands
| | - M. K. Schmidt
- grid.430814.a0000 0001 0674 1393Division of Molecular Pathology, the Netherlands Cancer Institute – Antoni van Leeuwenhoek Hospital, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands ,grid.10419.3d0000000089452978Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - E. W. Steyerberg
- grid.10419.3d0000000089452978Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
12
|
Demirjian S, Bashour CA, Shaw A, Schold JD, Simon J, Anthony D, Soltesz E, Gadegbeku CA. Predictive Accuracy of a Perioperative Laboratory Test-Based Prediction Model for Moderate to Severe Acute Kidney Injury After Cardiac Surgery. JAMA 2022; 327:956-964. [PMID: 35258532 PMCID: PMC8905398 DOI: 10.1001/jama.2022.1751] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
IMPORTANCE Effective treatment of acute kidney injury (AKI) is predicated on timely diagnosis; however, the lag in the increase in serum creatinine levels after kidney injury may delay therapy initiation. OBJECTIVE To determine the derivation and validation of predictive models for AKI after cardiac surgery. DESIGN, SETTING, AND PARTICIPANTS Multivariable prediction models were derived based on a retrospective observational cohort of adult patients undergoing cardiac surgery between January 2000 and December 2019 from a US academic medical center (n = 58 526) and subsequently validated on an external cohort from 3 US community hospitals (n = 4734). The date of final follow-up was January 15, 2020. EXPOSURES Perioperative change in serum creatinine and postoperative blood urea nitrogen, serum sodium, potassium, bicarbonate, and albumin from the first metabolic panel after cardiac surgery. MAIN OUTCOMES AND MEASURES Area under the receiver-operating characteristic curve (AUC) and calibration measures for moderate to severe AKI, per Kidney Disease: Improving Global Outcomes (KDIGO), and AKI requiring dialysis prediction models within 72 hours and 14 days following surgery. RESULTS In a derivation cohort of 58 526 patients (median [IQR] age, 66 [56-74] years; 39 173 [67%] men; 51 503 [91%] White participants), the rates of moderate to severe AKI and AKIrequiring dialysis were 2674 (4.6%) and 868 (1.48%) within 72 hours and 3156 (5.4%) and 1018 (1.74%) within 14 days after surgery. The median (IQR) interval to first metabolic panel from conclusion of the surgical procedure was 10 (7-12) hours. In the derivation cohort, the metabolic panel-based models had excellent predictive discrimination for moderate to severe AKI within 72 hours (AUC, 0.876 [95% CI, 0.869-0.883]) and 14 days (AUC, 0.854 [95% CI, 0.850-0.861]) after the surgical procedure and for AKI requiring dialysis within 72 hours (AUC, 0.916 [95% CI, 0.907-0.926]) and 14 days (AUC, 0.900 [95% CI, 0.889-0.909]) after the surgical procedure. In the validation cohort of 4734 patients (median [IQR] age, 67 (60-74) years; 3361 [71%] men; 3977 [87%] White participants), the models for moderate to severe AKI after the surgical procedure showed AUCs of 0.860 (95% CI, 0.838-0.882) within 72 hours and 0.842 (95% CI, 0.820-0.865) within 14 days and the models for AKI requiring dialysis and 14 days had an AUC of 0.879 (95% CI, 0.840-0.918) within 72 hours and 0.873 (95% CI, 0.836-0.910) within 14 days after the surgical procedure. Calibration assessed by Spiegelhalter z test showed P >.05 indicating adequate calibration for both validation and derivation models. CONCLUSIONS AND RELEVANCE Among patients undergoing cardiac surgery, a prediction model based on perioperative basic metabolic panel laboratory values demonstrated good predictive accuracy for moderate to severe acute kidney injury within 72 hours and 14 days after the surgical procedure. Further research is needed to determine whether use of the risk prediction tool improves clinical outcomes.
Collapse
Affiliation(s)
- Sevag Demirjian
- Department of Nephrology and Hypertension, Cleveland Clinic, Cleveland, Ohio
| | - C. Allen Bashour
- Department of Intensive Care and Resuscitation, Cleveland Clinic, Cleveland, Ohio
| | - Andrew Shaw
- Department of Intensive Care and Resuscitation, Cleveland Clinic, Cleveland, Ohio
| | - Jesse D. Schold
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - James Simon
- Department of Nephrology and Hypertension, Cleveland Clinic, Cleveland, Ohio
| | - David Anthony
- Department of Intensive Care and Resuscitation, Cleveland Clinic, Cleveland, Ohio
- Department of Cardiothoracic Anesthesiology, Cleveland Clinic, Cleveland, Ohio
| | - Edward Soltesz
- Department of Thoracic and Cardiovascular Surgery, Cleveland Clinic, Cleveland, Ohio
| | | |
Collapse
|
13
|
Abstract
In research, policy, and practice, continuous variables are often categorized. Statisticians have generally advised against categorization for many reasons, such as loss of information and precision as well as distortion of estimated statistics. Here, a different kind of problem with categorization is considered: the idea that, for a given continuous variable, there is a unique set of cut points that is the objectively correct or best categorization. It is shown that this is unlikely to be the case because categorized variables typically exist in webs of statistical relationships with other variables. The choice of cut points for a categorized variable can influence the values of many statistics relating that variable to others. This essay explores the substantive trade‐offs that can arise between different possible cut points to categorize a continuous variable, making it difficult to say that any particular categorization is objectively best. Limitations of different approaches to selecting cut points are discussed. Contextual trade‐offs may often be an argument against categorization. At the very least, such trade‐offs mean that research inferences, or decisions about policy or practice, that involve categorized variables should be framed and acted upon with flexibility and humility. In practical settings, the choice of cut points for categorizing a continuous variable is likely to entail trade‐offs across multiple statistical relationships between the categorized variable and other variables. These trade‐offs mean that no single categorization is objectively best or correct.
Collapse
Affiliation(s)
- Evan L Busch
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| |
Collapse
|
14
|
Samawi H, Chen DG, Ahmed F, Kersey J. Medical diagnostics accuracy measures and cut-point selection: An innovative approach based on relative net benefit. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.2001016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Hani Samawi
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| | - Ding-Geng Chen
- College of Health Solutions, Arizona State University, Phoenix, Arizona, USA
| | - Ferdous Ahmed
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| | - Jing Kersey
- Department of Biostatistics, Epidemiology and Environmental Health Sciences, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia, USA
| |
Collapse
|
15
|
Katki HA, Bebu I. A simple framework to identify optimal cost-effective risk thresholds for a single screen: Comparison to Decision Curve Analysis. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2021; 184:887-903. [PMID: 35702631 PMCID: PMC9190212 DOI: 10.1111/rssa.12680] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Decision Curve Analysis (DCA) is a popular approach for assessing biomarkers and risk models, but does not require costs and thus cannot identify optimal risk thresholds for actions. Full decision analyses can identify optimal thresholds, but typically used methods are complex and often difficult to understand. We develop a simple framework to calculate the Incremental Net Benefit for a single-time screen as a function of costs (for tests and treatments) and effectiveness (life-years gained). We provide simple expressions for the optimal cost-effective risk-threshold and, equally importantly, for the monetary value of life-years gained associated with the risk-threshold. We consider the controversy over the risk-threshold to screen women for mutations in BRCA1/2. Importantly, most, and sometimes even all, of the thresholds identified by DCA are infeasible based on their associated dollars per life-year gained. Our simple framework facilitates sensitivity analyses to cost and effectiveness parameters. The proposed approach estimates optimal risk thresholds in a simple and transparent manner, provides intuition about which quantities are critical, and may serve as a bridge between DCA and a full decision analysis.
Collapse
Affiliation(s)
- Hormuzd A Katki
- Division of Cancer Epidemiology and Genetics, US National Cancer Institute, NIH/DHHS, Rockville MD, USA
| | - Ionut Bebu
- Biostatistics Center, George Washington University, Rockville MD, USA
| |
Collapse
|
16
|
Ghosal S, Chen Z. Discriminatory Capacity of Prenatal Ultrasound Measures for Large-for-Gestational-Age Birth: A Bayesian Approach to ROC Analysis Using Placement Values. STATISTICS IN BIOSCIENCES 2021; 14:1-22. [PMID: 35342482 PMCID: PMC8942391 DOI: 10.1007/s12561-021-09311-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Predicting large fetuses at birth is of great interest to obstetricians. Using an NICHD Scandinavian Study that collected longitudinal ultrasound examination data during pregnancy, we estimate diagnostic accuracy parameters of estimated fetal weight (EFW) at various times during pregnancy in predicting large-for-gestational-age. We adopt a placement value based Bayesian regression model with random effects to estimate ROC curves. The use of placement values allows us to model covariate effects directly on the ROC curves and the adoption of a Bayesian approach accommodates the a priori constraint that an ROC curve of EFW near delivery should dominate another further away. The proposed methodology is shown to perform better than some alternative approaches in simulations and its application to the Scandinavian Study data suggests that diagnostic accuracy of EFW can improve about 65% from week 17 to 37 of gestation.
Collapse
|
17
|
Pavlou M, Qu C, Omar RZ, Seaman SR, Steyerberg EW, White IR, Ambler G. Estimation of required sample size for external validation of risk models for binary outcomes. Stat Methods Med Res 2021; 30:2187-2206. [PMID: 33881369 PMCID: PMC8529102 DOI: 10.1177/09622802211007522] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Risk-prediction models for health outcomes are used in practice as part of
clinical decision-making, and it is essential that their performance be
externally validated. An important aspect in the design of a validation study is
choosing an adequate sample size. In this paper, we investigate the sample size
requirements for validation studies with binary outcomes to estimate measures of
predictive performance (C-statistic for discrimination and calibration slope and
calibration in the large). We aim for sufficient precision in the estimated
measures. In addition, we investigate the sample size to achieve sufficient
power to detect a difference from a target value. Under normality assumptions on
the distribution of the linear predictor, we obtain simple estimators for sample
size calculations based on the measures above. Simulation studies show that the
estimators perform well for common values of the C-statistic and outcome
prevalence when the linear predictor is marginally Normal. Their performance
deteriorates only slightly when the normality assumptions are violated. We also
propose estimators which do not require normality assumptions but require
specification of the marginal distribution of the linear predictor and require
the use of numerical integration. These estimators were also seen to perform
very well under marginal normality. Our sample size equations require a
specified standard error (SE) and the anticipated C-statistic and outcome
prevalence. The sample size requirement varies according to the prognostic
strength of the model, outcome prevalence, choice of the performance measure and
study objective. For example, to achieve an SE < 0.025 for the C-statistic,
60–170 events are required if the true C-statistic and outcome prevalence are
between 0.64–0.85 and 0.05–0.3, respectively. For the calibration slope and
calibration in the large, achieving SE < 0.15 would require 40–280 and 50–100 events, respectively. Our
estimators may also be used for survival outcomes when the proportion of
censored observations is high.
Collapse
Affiliation(s)
- Menelaos Pavlou
- Department of Statistical Science, University College London, UK
| | - Chen Qu
- Department of Statistical Science, University College London, UK
| | - Rumana Z Omar
- Department of Statistical Science, University College London, UK
| | - Shaun R Seaman
- MRC Biostatistics Unit, Institute of Public Health, University of Cambridge, Cambridge, UK
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands
| | - Ian R White
- MRC Clinical Trials Unit, University College London, London, UK
| | - Gareth Ambler
- Department of Statistical Science, University College London, UK
| |
Collapse
|
18
|
Vistisen D, Andersen GS, Hulman A, McGurnaghan SJ, Colhoun HM, Henriksen JE, Thomsen RW, Persson F, Rossing P, Jørgensen ME. A Validated Prediction Model for End-Stage Kidney Disease in Type 1 Diabetes. Diabetes Care 2021; 44:901-907. [PMID: 33509931 DOI: 10.2337/dc20-2586] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 12/30/2020] [Indexed: 02/03/2023]
Abstract
OBJECTIVE End-stage kidney disease (ESKD) is a life-threatening complication of diabetes that can be prevented or delayed by intervention. Hence, early detection of people at increased risk is essential. RESEARCH DESIGN AND METHODS From a population-based cohort of 5,460 clinically diagnosed Danish adults with type 1 diabetes followed from 2001 to 2016, we developed a prediction model for ESKD accounting for the competing risk of death. Poisson regression analysis was used to estimate the model on the basis of information routinely collected from clinical examinations. The effect of including an extended set of predictors (lipids, alcohol intake, etc.) was further evaluated, and potential interactions identified in a survival tree analysis were tested. The final model was externally validated in 9,175 adults from Denmark and Scotland. RESULTS During a median follow-up of 10.4 years (interquartile limits 5.1; 14.7), 303 (5.5%) of the participants (mean [SD] age 42.3 [16.5] years) developed ESKD, and 764 (14.0%) died without having developed ESKD. The final ESKD prediction model included age, male sex, diabetes duration, estimated glomerular filtration rate, micro- and macroalbuminuria, systolic blood pressure, hemoglobin A1c, smoking, and previous cardiovascular disease. Discrimination was excellent for 5-year risk of an ESKD event, with a C-statistic of 0.888 (95% CI 0.849; 0.927) in the derivation cohort and confirmed at 0.865 (0.811; 0.919) and 0.961 (0.940; 0.981) in the external validation cohorts from Denmark and Scotland, respectively. CONCLUSIONS We have derived and validated a novel, high-performing ESKD prediction model for risk stratification in the adult type 1 diabetes population. This model may improve clinical decision making and potentially guide early intervention.
Collapse
Affiliation(s)
| | | | - Adam Hulman
- Steno Diabetes Center Aarhus, Aarhus, Denmark
| | | | | | | | | | | | - Peter Rossing
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,University of Copenhagen, Copenhagen, Denmark
| | - Marit E Jørgensen
- Steno Diabetes Center Copenhagen, Gentofte, Denmark.,National Institute of Public Health, University of Southern Denmark, Copenhagen, Denmark
| |
Collapse
|
19
|
Rostami S, Rafei A, Damghanian M, Khakbazan Z, Maleki F, Zendehdel K. Discriminatory Accuracy of the Gail Model for Breast Cancer Risk Assessment among Iranian Women. IRANIAN JOURNAL OF PUBLIC HEALTH 2021; 49:2205-2213. [PMID: 33708742 PMCID: PMC7917489 DOI: 10.18502/ijph.v49i11.4739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background: The Gail model is the most well-known tool for breast cancer risk assessment worldwide. Although it was validated in various Western populations, inconsistent results were reported from Asian populations. We used data from a large case-control study and evaluated the discriminatory accuracy of the Gail model for breast cancer risk assessment among the Iranian female population. Methods: We used data from 942 breast cancer patients and 975 healthy controls at the Cancer Institute of Iran, Tehran, Iran, in 2016. We refitted the Gail model to our case-control data (the IR-Gail model). We compared the discriminatory power of the IR-Gail with the original Gail model, using ROC curve analyses and estimation of the area under the ROC curve (AUC). Results: Except for the history of biopsies that showed an extremely high relative risk (OR=9.1), the observed ORs were similar to the estimates observed in Gail’s study. Incidence rates of breast cancer were extremely lower in Iran than in the USA, leading to a lower average absolute risk among the Iranian population (2.78, ±SD 2.45). The AUC was significantly improved after refitting the model, but it remained modest (0.636 vs. 0.627, ΔAUC = 0.009, bootstrapped P=0.008). We reported that the cut-point of 1.67 suggested in the Gail study did not discriminate between breast cancer patients and controls among the Iranian female population. Conclusion: Although the coefficients from the local study improved the discriminatory accuracy of the model, it remained modest. Cohort studies are warranted to evaluate the validity of the model for Iranian women.
Collapse
Affiliation(s)
- Sahar Rostami
- Department of Reproductive Health and Midwifery, School of Nursing and Midwifery, Tehran University of Medical Sciences, Tehran, Iran.,Cancer Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Rafei
- Cancer Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran
| | - Maryam Damghanian
- Nursing and Midwifery Care Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Zohreh Khakbazan
- Nursing and Midwifery Care Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Farzad Maleki
- Cancer Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran.,Social Determinants of Health Research Center, Urmia University of Medical Sciences, Urmia, Iran
| | - Kazem Zendehdel
- Cancer Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran.,Cancer Biology Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran.,Breast Disease Research Center, Cancer Institute of Iran, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
20
|
Abstract
Risk prediction models have been developed in many contexts to classify individuals according to a single outcome, such as risk of a disease. Emerging “-omic” biomarkers provide panels of features that can simultaneously predict multiple outcomes from a single biological sample, creating issues of multiplicity reminiscent of exploratory hypothesis testing. Here I propose definitions of some basic criteria for evaluating prediction models of multiple outcomes. I define calibration in the multivariate setting and then distinguish between outcome-wise and individual-wise prediction, and within the latter between joint and panel-wise prediction. I give examples such as screening and early detection in which different senses of prediction may be more appropriate. In each case I propose definitions of sensitivity, specificity, concordance, positive and negative predictive value and relative utility. I link the definitions through a multivariate probit model, showing that the accuracy of a multivariate prediction model can be summarised by its covariance with a liability vector. I illustrate the concepts on a biomarker panel for early detection of eight cancers, and on polygenic risk scores for six common diseases.
Collapse
Affiliation(s)
- Frank Dudbridge
- Frank Dudbridge, Department of Health Sciences, University of Leicester, Leicester LE1 7RH, UK.
| |
Collapse
|
21
|
Pal Choudhury P, Chaturvedi AK, Chatterjee N. Evaluating Discrimination of a Lung Cancer Risk Prediction Model Using Partial Risk-Score in a Two-Phase Study. Cancer Epidemiol Biomarkers Prev 2020; 29:1196-1203. [PMID: 32277002 PMCID: PMC11807412 DOI: 10.1158/1055-9965.epi-19-1574] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 02/28/2020] [Accepted: 04/01/2020] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Independent validation of risk prediction models in prospective cohorts is required for risk-stratified cancer prevention. Such studies often have a two-phase design, where information on expensive biomarkers are ascertained in a nested substudy of the original cohort. METHODS We propose a simple approach for evaluating model discrimination that accounts for incomplete follow-up and gains efficiency by using data from all individuals in the cohort irrespective of whether they were sampled in the substudy. For evaluating the AUC, we estimated probabilities of risk-scores for cases being larger than those in controls conditional on partial risk-scores, computed using partial covariate information. The proposed method was compared with an inverse probability weighted (IPW) approach that used information only from the subjects in the substudy. We evaluated age-stratified AUC of a model including questionnaire-based risk factors and inflammation biomarkers to predict 10-year risk of lung cancer using data from the Prostate, Lung, Colorectal, and Ovarian Cancer (1993-2009) trial (30,297 ever-smokers, 1,253 patients with lung cancer). RESULTS For estimating age-stratified AUC of the combined lung cancer risk model, the proposed method was 3.8 to 5.3 times more efficient compared with the IPW approach across the different age groups. Extensive simulation studies also demonstrated substantial efficiency gain compared with the IPW approach. CONCLUSIONS Incorporating information from all individuals in a two-phase cohort study can substantially improve precision of discrimination measures of lung cancer risk models. IMPACT Novel, simple, and practically useful methods are proposed for evaluating risk models, a critical step toward risk-stratified cancer prevention.
Collapse
Affiliation(s)
- Parichoy Pal Choudhury
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Anil K Chaturvedi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland.
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
22
|
Kim J, Yuan C, Babic A, Bao Y, Clish CB, Pollak MN, Amundadottir LT, Klein AP, Stolzenberg-Solomon RZ, Pandharipande PV, Brais LK, Welch MW, Ng K, Giovannucci EL, Sesso HD, Manson JE, Stampfer MJ, Fuchs CS, Wolpin BM, Kraft P. Genetic and Circulating Biomarker Data Improve Risk Prediction for Pancreatic Cancer in the General Population. Cancer Epidemiol Biomarkers Prev 2020; 29:999-1008. [PMID: 32321713 PMCID: PMC8020898 DOI: 10.1158/1055-9965.epi-19-1389] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/31/2020] [Accepted: 02/07/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Pancreatic cancer is the third leading cause of cancer death in the United States, and 80% of patients present with advanced, incurable disease. Risk markers for pancreatic cancer have been characterized, but combined models are not used clinically to identify individuals at high risk for the disease. METHODS Within a nested case-control study of 500 pancreatic cancer cases diagnosed after blood collection and 1,091 matched controls enrolled in four U.S. prospective cohorts, we characterized absolute risk models that included clinical factors (e.g., body mass index, history of diabetes), germline genetic polymorphisms, and circulating biomarkers. RESULTS Model discrimination showed an area under ROC curve of 0.62 via cross-validation. Our final integrated model identified 3.7% of men and 2.6% of women who had at least 3 times greater than average risk in the ensuing 10 years. Individuals within the top risk percentile had a 4% risk of developing pancreatic cancer by age 80 years and 2% 10-year risk at age 70 years. CONCLUSIONS Risk models that include established clinical, genetic, and circulating factors improved disease discrimination over models using clinical factors alone. IMPACT Absolute risk models for pancreatic cancer may help identify individuals in the general population appropriate for disease interception.
Collapse
Affiliation(s)
- Jihye Kim
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Chen Yuan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Ana Babic
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Ying Bao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Clary B Clish
- Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts
| | - Michael N Pollak
- Cancer Prevention Research Unit, Department of Oncology, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Laufey T Amundadottir
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Alison P Klein
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
- Department of Pathology, Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Rachael Z Stolzenberg-Solomon
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland
| | - Pari V Pandharipande
- Department of Radiology and Institute for Technology Assessment, Massachusetts General Hospital, Boston, Massachusetts
| | - Lauren K Brais
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Marisa W Welch
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Kimmie Ng
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Edward L Giovannucci
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Howard D Sesso
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Division of Prevention Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - JoAnn E Manson
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Division of Prevention Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Meir J Stampfer
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Charles S Fuchs
- Department of Medical Oncology, Yale Cancer Center, New Haven, Connecticut
- Department of Medicine, Yale School of Medicine, New Haven, Connecticut
- Department of Medical Oncology, Smilow Cancer Hospital, New Haven, Connecticut
| | - Brian M Wolpin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.
| | - Peter Kraft
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| |
Collapse
|
23
|
Abstract
Strategies to prevent cancer and diagnose it early when it is most treatable are needed to reduce the public health burden from rising disease incidence. Risk assessment is playing an increasingly important role in targeting individuals in need of such interventions. For breast cancer many individual risk factors have been well understood for a long time, but the development of a fully comprehensive risk model has not been straightforward, in part because there have been limited data where joint effects of an extensive set of risk factors may be estimated with precision. In this article we first review the approach taken to develop the IBIS (Tyrer-Cuzick) model, and describe recent updates. We then review and develop methods to assess calibration of models such as this one, where the risk of disease allowing for competing mortality over a long follow-up time or lifetime is estimated. The breast cancer risk model model and calibration assessment methods are demonstrated using a cohort of 132,139 women attending mammography screening in the State of Washington, USA.
Collapse
Affiliation(s)
- Adam R. Brentnall
- Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary University of London, Charterhouse square, London, EC1M 6BQ
| | - Jack Cuzick
- Centre for Cancer Prevention, Wolfson Institute of Preventive Medicine, Queen Mary University of London, Charterhouse square, London, EC1M 6BQ
| |
Collapse
|
24
|
Farooqi MAM, Gerstein H, Yusuf S, Leong DP. Accumulation of Deficits as a Key Risk Factor for Cardiovascular Morbidity and Mortality: A Pooled Analysis of 154 000 Individuals. J Am Heart Assoc 2020; 9:e014686. [PMID: 31986990 PMCID: PMC7033862 DOI: 10.1161/jaha.119.014686] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Background Frailty is associated with higher mortality in individuals at high cardiovascular disease (CVD) risk. We hypothesize that frailty is a more important prognostic factor than CVD risk factors and aim to determine the prognostic value of a cumulative deficit frailty index in patients with or at high risk for CVD. Methods and Results We conducted an individual‐level pooled analysis of participants with or at risk for CVD, recruited in 14 multicenter clinical trials. The cumulative deficit index was calculated as the proportion of 26 deficits exhibited. Individuals were categorized as nonfrail, prefrail, or frail if they had indexes of ≤0.1, >0.1 to 0.21, or >0.21, respectively. CVD risk was assessed using the Framingham score. Outcomes included CVD event (new or recurrent myocardial infarction, stroke, or heart failure) and mortality. We studied 154 696 patients (mean age, 70.8 years; 63% men) with median follow‐up of 3.2 years. There were 17 535 CVD events and 15 067 deaths. The frail group (n=13 872) had higher risk of a CVD event (incidence rate ratio, 1.97; 95% CI, 1.85–2.08), all‐cause mortality (hazard ratio, 1.91; 95% CI, 1.79–2.03), and CVD mortality (hazard ratio, 1.91; 95% CI, 1.77–2.05) than the nonfrail group (n=101 343). Associations remained unchanged after adjusting for CVD risk factors. The index statistically outperformed the Framingham score in its ability to discriminate CVD events (C‐statistic, 0.60 [95% CI, 0.60–0.61] versus 0.58 [95% CI, 0.57–0.58], respectively; P<0.001). Conclusions In individuals with or at high risk of developing CVD, the cumulative deficit index is associated with increased CVD events and mortality, independent of CVD risk factors, and adds incremental prognostic value.
Collapse
Affiliation(s)
| | - Hertzel Gerstein
- Department of Medicine McMaster University Hamilton Canada.,Population Health Research Institute McMaster University and Hamilton Health Sciences Hamilton Canada.,Department of Health Research Methods, Evidence, and Impact McMaster University Hamilton Canada
| | - Salim Yusuf
- Department of Medicine McMaster University Hamilton Canada.,Population Health Research Institute McMaster University and Hamilton Health Sciences Hamilton Canada.,Department of Health Research Methods, Evidence, and Impact McMaster University Hamilton Canada
| | - Darryl P Leong
- Department of Medicine McMaster University Hamilton Canada.,Population Health Research Institute McMaster University and Hamilton Health Sciences Hamilton Canada.,Department of Health Research Methods, Evidence, and Impact McMaster University Hamilton Canada
| |
Collapse
|
25
|
Wei C, Li M, Wen Y, Ye C, Lu Q. A multi-locus predictiveness curve and its summary assessment for genetic risk prediction. Stat Methods Med Res 2020; 29:44-56. [PMID: 30612522 PMCID: PMC6612460 DOI: 10.1177/0962280218819202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Genetic association studies using high-throughput genotyping and sequencing technologies have identified a large number of genetic variants associated with complex human diseases. These findings have provided an unprecedented opportunity to identify individuals in the population at high risk for disease who carry causal genetic mutations and hold great promise for early intervention and individualized medicine. While interest is high in building risk prediction models based on recent genetic findings, it is crucial to have appropriate statistical measurements to assess the performance of a genetic risk prediction model. Predictiveness curves were recently proposed as a graphic tool for evaluating a risk prediction model on the basis of a single continuous biomarker. The curve evaluates a risk prediction model for classification performance as well as its usefulness when applied to a population. In this article, we extend the predictiveness curve to measure the collective contribution of multiple genetic variants. We further propose a nonparametric, U-statistics-based measurement, referred to as the U-Index, to quantify the performance of a multi-locus predictiveness curve. In particular, a global U-Index and a partial U-Index can be used in the general population and a subpopulation of particular clinical interest, respectively. Through simulation studies, we demonstrate that the proposed U-Index has advantages over several existing summary statistics under various disease models. We also show that the partial U-Index can have its own uniqueness when rare variants have a substantial contribution to disease risk. Finally, we use the proposed predictiveness curve and its corresponding U-Index to evaluate the performance of a genetic risk prediction model for nicotine dependence.
Collapse
Affiliation(s)
- Changshuai Wei
- Core Artificial Intelligence, Amazon.com Inc, Seattle, WA, USA
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Bloomington, IN, USA
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Chengyin Ye
- Department of Health Management, Hangzhou Normal University, Hangzhou, China
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
26
|
Wynants L, van Smeden M, McLernon DJ, Timmerman D, Steyerberg EW, Van Calster B. Three myths about risk thresholds for prediction models. BMC Med 2019; 17:192. [PMID: 31651317 PMCID: PMC6814132 DOI: 10.1186/s12916-019-1425-3] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 09/16/2019] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results. MAIN TEXT We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model. CONCLUSION Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.
Collapse
Affiliation(s)
- Laure Wynants
- KU Leuven Department of Development and Regeneration, Leuven, Belgium. .,Department of Epidemiology, CAPHRI Care and Public Health Research Institute, Maastricht University, Maastricht, The Netherlands.
| | - Maarten van Smeden
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - David J McLernon
- Medical Statistics Team, Institute of Applied Health Sciences, School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Aberdeen, UK
| | - Dirk Timmerman
- KU Leuven Department of Development and Regeneration, Leuven, Belgium.,Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Ben Van Calster
- KU Leuven Department of Development and Regeneration, Leuven, Belgium.,Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | | |
Collapse
|
27
|
Gail MH, Pfeiffer RM. Breast Cancer Risk Model Requirements for Counseling, Prevention, and Screening. J Natl Cancer Inst 2019; 110:994-1002. [PMID: 29490057 DOI: 10.1093/jnci/djy013] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2017] [Accepted: 01/16/2018] [Indexed: 01/05/2023] Open
Abstract
Background Incorporation of polygenic risk scores and mammographic density into models to predict breast cancer incidence can increase discriminatory accuracy (area under the receiver operating characteristic curve [AUC]) from 0.6 for models based only on epidemiologic factors to 0.7. It is timely to assess what impact these improvements will have on individual counseling and on public health prevention and screening strategies, and to determine what further improvements are needed. Methods We studied various clinical and public health applications using a log-normal distribution of risk. Results Provided they are well calibrated, even risk models with AUCs of 0.6 to 0.7 provide useful perspective for individual counseling and for weighing the harms and benefits of preventive interventions in the clinic. At the population level, they are helpful for designing preventive intervention trials, for assessing reductions in absolute risk from reducing exposure to modifiable risk factors, and for resource allocation (although a higher AUC would be desirable for risk-based allocation). Other public health applications require higher AUCs that can only be achieved with risk predictors 1.6 to 8.8 times as strong as all those yet discovered combined. Such applications are preventing an appreciable proportion of population disease when employing a high-risk prevention strategy and deciding who should be screened for subclinical disease. Conclusions Current and foreseeable risk models are useful for counseling and some prevention activities, but given the daunting challenge of achieving, for example, an AUC of 0.8, considerable effort should be put into finding effective preventive interventions and screening strategies with fewer adverse effects.
Collapse
Affiliation(s)
- Mitchell H Gail
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD
| | - Ruth M Pfeiffer
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD
| |
Collapse
|
28
|
Parast L, Mathews M, Friedberg MW. Dynamic risk prediction for diabetes using biomarker change measurements. BMC Med Res Methodol 2019; 19:175. [PMID: 31412790 PMCID: PMC6694545 DOI: 10.1186/s12874-019-0812-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 07/29/2019] [Indexed: 12/19/2022] Open
Abstract
Background Dynamic risk models, which incorporate disease-free survival and repeated measurements over time, might yield more accurate predictions of future health status compared to static models. The objective of this study was to develop and apply a dynamic prediction model to estimate the risk of developing type 2 diabetes mellitus. Methods Both a static prediction model and a dynamic landmark model were used to provide predictions of a 2-year horizon time for diabetes-free survival, updated at 1, 2, and 3 years post-baseline i.e., predicting diabetes-free survival to 2 years and predicting diabetes-free survival to 3 years, 4 years, and 5 years post-baseline, given the patient already survived past 1 year, 2 years, and 3 years post-baseline, respectively. Prediction accuracy was evaluated at each time point using robust non-parametric procedures. Data from 2057 participants of the Diabetes Prevention Program (DPP) study (1027 in metformin arm, 1030 in placebo arm) were analyzed. Results The dynamic landmark model demonstrated good prediction accuracy with area under curve (AUC) estimates ranging from 0.645 to 0.752 and Brier Score estimates ranging from 0.088 to 0.135. Relative to a static risk model, the dynamic landmark model did not significantly differ in terms of AUC but had significantly lower (i.e., better) Brier Score estimates for predictions at 1, 2, and 3 years (e.g. 0.167 versus 0.099; difference − 0.068 95% CI − 0.083 to − 0.053, at 3 years in placebo group) post-baseline. Conclusions Dynamic prediction models based on longitudinal, repeated risk factor measurements have the potential to improve the accuracy of future health status predictions. Electronic supplementary material The online version of this article (10.1186/s12874-019-0812-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Layla Parast
- RAND Corporation, 1776 Main St, Santa Monica, CA, 90401, USA.
| | - Megan Mathews
- RAND Corporation, 1776 Main St, Santa Monica, CA, 90401, USA
| | | |
Collapse
|
29
|
Pfeiffer RM, Gail MH. Estimating the decision curve and its precision from three study designs. Biom J 2019; 62:764-776. [PMID: 31394013 DOI: 10.1002/bimj.201800240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 06/26/2019] [Accepted: 07/09/2019] [Indexed: 01/16/2023]
Abstract
The decision curve plots the net benefit ( N B ) of a risk model for making decisions over a range of risk thresholds, corresponding to different ratios of misclassification costs. We discuss three methods to estimate the decision curve, together with corresponding methods of inference and methods to compare two risk models at a given risk threshold. One method uses risks (R) and a binary event indicator (Y) on the entire validation cohort. This method makes no assumptions on how well-calibrated the risk model is nor on the incidence of disease in the population and is comparatively robust to model miscalibration. If one assumes that the model is well-calibrated, one can compute a much more precise estimate of N B based on risks R alone. However, if the risk model is miscalibrated, serious bias can result. Case-control data can also be used to estimate N B if the incidence (or prevalence) of the event ( Y = 1 ) is known. This strategy has comparable efficiency to using the full ( R , Y ) data, and its efficiency is only modestly less than that for the full ( R , Y ) data if the incidence is estimated from the mean of Y. We estimate variances using influence functions and propose a bootstrap procedure to obtain simultaneous confidence bands around the decision curve for a range of thresholds. The influence function approach to estimate variances can also be applied to cohorts derived from complex survey samples instead of simple random samples.
Collapse
Affiliation(s)
- Ruth M Pfeiffer
- Biostatistics Branch, National Cancer Institute, Bethesda, MD, USA
| | - Mitchell H Gail
- Biostatistics Branch, National Cancer Institute, Bethesda, MD, USA
| |
Collapse
|
30
|
Brinton JT, Hendrick RE, Ringham BM, Kriege M, Glueck DH. Improving the diagnostic accuracy of a stratified screening strategy by identifying the optimal risk cutoff. Cancer Causes Control 2019; 30:1145-1155. [PMID: 31377875 DOI: 10.1007/s10552-019-01208-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 06/29/2019] [Indexed: 12/29/2022]
Abstract
BACKGROUND The American Cancer Society (ACS) suggests using a stratified strategy for breast cancer screening. The strategy includes assessing risk of breast cancer, screening women at high risk with both MRI and mammography, and screening women at low risk with mammography alone. The ACS chose their cutoff for high risk using expert consensus. METHODS We propose instead an analytic approach that maximizes the diagnostic accuracy (AUC/ROC) of a risk-based stratified screening strategy in a population. The inputs are the joint distribution of screening test scores, and the odds of disease, for the given risk score. Using the approach for breast cancer screening, we estimated the optimal risk cutoff for two different risk models: the Breast Cancer Screening Consortium (BCSC) model and a hypothetical model with much better discriminatory accuracy. Data on mammography and MRI test score distributions were drawn from the Magnetic Resonance Imaging Screening Study Group. RESULTS A risk model with an excellent discriminatory accuracy (c-statistic [Formula: see text]) yielded a reasonable cutoff where only about 20% of women had dual screening. However, the BCSC risk model (c-statistic [Formula: see text]) lacked the discriminatory accuracy to differentiate between women who needed dual screening, and women who needed only mammography. CONCLUSION Our research provides a general approach to optimize the diagnostic accuracy of a stratified screening strategy in a population, and to assess whether risk models are sufficiently accurate to guide stratified screening. For breast cancer, most risk models lack enough discriminatory accuracy to make stratified screening a reasonable recommendation.
Collapse
Affiliation(s)
- John T Brinton
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA. .,Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, USA.
| | - R Edward Hendrick
- Department of Radiology, School of Medicine, University of Colorado Denver, Aurora, CO, USA
| | - Brandy M Ringham
- Lifecourse Epidemiology of Adiposity and Diabetes (LEAD) Center, University of Colorado Denver, Aurora, CO, USA
| | - Mieke Kriege
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands
| | - Deborah H Glueck
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, USA
| |
Collapse
|
31
|
Katki HA. Quantifying risk stratification provided by diagnostic tests and risk predictions: Comparison to AUC and decision curve analysis. Stat Med 2019; 38:2943-2955. [PMID: 31037749 DOI: 10.1002/sim.8163] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 03/14/2019] [Accepted: 03/22/2019] [Indexed: 01/12/2023]
Abstract
A property of diagnostic tests and risk models deserving more attention is risk stratification, defined as the ability of a test or model to separate those at high absolute risk of disease from those at low absolute risk. Risk stratification fills a gap between measures of classification (ie, area under the curve (AUC)) that do not require absolute risks and decision analysis that requires not only absolute risks but also subjective specification of costs and utilities. We introduce mean risk stratification (MRS) as the average change in risk of disease (posttest-pretest) revealed by a diagnostic test or risk model dichotomized at a risk threshold. Mean risk stratification is particularly valuable for rare conditions, where AUC can be high but MRS can be low, identifying situations that temper overenthusiasm for screening with the new test/model. We apply MRS to the controversy over who should get testing for mutations in BRCA1/2 that cause high risks of breast and ovarian cancers. To reveal different properties of risk thresholds to refer women for BRCA1/2 testing, we propose an eclectic approach considering MRS and other metrics. The value of MRS is to interpret AUC in the context of BRCA1/2 mutation prevalence, providing a range of risk thresholds at which a risk model is "optimally informative," and to provide insight into why net benefit arrives to its conclusion.
Collapse
Affiliation(s)
- Hormuzd A Katki
- US National Cancer Institute, Division of Cancer Epidemiology and Genetics, Rockville, Maryland
| |
Collapse
|
32
|
Blanche P, Gerds TA, Ekstrøm CT. The Wally plot approach to assess the calibration of clinical prediction models. LIFETIME DATA ANALYSIS 2019; 25:150-167. [PMID: 29214550 DOI: 10.1007/s10985-017-9414-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/29/2017] [Indexed: 06/07/2023]
Abstract
A prediction model is calibrated if, roughly, for any percentage x we can expect that x subjects out of 100 experience the event among all subjects that have a predicted risk of x%. Typically, the calibration assumption is assessed graphically but in practice it is often challenging to judge whether a "disappointing" calibration plot is the consequence of a departure from the calibration assumption, or alternatively just "bad luck" due to sampling variability. We propose a graphical approach which enables the visualization of how much a calibration plot agrees with the calibration assumption to address this issue. The approach is mainly based on the idea of generating new plots which mimic the available data under the calibration assumption. The method handles the common non-trivial situations in which the data contain censored observations and occurrences of competing events. This is done by building on ideas from constrained non-parametric maximum likelihood estimation methods. Two examples from large cohort data illustrate our proposal. The 'wally' R package is provided to make the methodology easily usable.
Collapse
Affiliation(s)
- Paul Blanche
- LMBA, University of South Brittany, Vannes, France.
| | - Thomas A Gerds
- Department of biostatistics, University of Copenhagen, Copenhagen, Denmark
| | - Claus T Ekstrøm
- Department of biostatistics, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
33
|
Yuan Y, Zhou QM, Li B, Cai H, Chow EJ, Armstrong GT. A threshold-free summary index of prediction accuracy for censored time to event data. Stat Med 2018; 37:1671-1681. [PMID: 29424000 PMCID: PMC5895543 DOI: 10.1002/sim.7606] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2016] [Revised: 09/24/2017] [Accepted: 12/14/2017] [Indexed: 11/09/2022]
Abstract
Prediction performance of a risk scoring system needs to be carefully assessed before its adoption in clinical practice. Clinical preventive care often uses risk scores to screen asymptomatic population. The primary clinical interest is to predict the risk of having an event by a prespecified future time t0 . Accuracy measures such as positive predictive values have been recommended for evaluating the predictive performance. However, for commonly used continuous or ordinal risk score systems, these measures require a subjective cutoff threshold value that dichotomizes the risk scores. The need for a cutoff value created barriers for practitioners and researchers. In this paper, we propose a threshold-free summary index of positive predictive values that accommodates time-dependent event status and competing risks. We develop a nonparametric estimator and provide an inference procedure for comparing this summary measure between 2 risk scores for censored time to event data. We conduct a simulation study to examine the finite-sample performance of the proposed estimation and inference procedures. Lastly, we illustrate the use of this measure on a real data example, comparing 2 risk score systems for predicting heart failure in childhood cancer survivors.
Collapse
Affiliation(s)
- Yan Yuan
- School of Public Health, University of Alberta, Edmonton, AB T6G1C9, Canada
| | - Qian M. Zhou
- Department of Mathematics and Statistics, Mississippi State University, Starkville, Mississippi 39762, USA
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, B.C. V5A1S6, Canada
| | - Bingying Li
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, B.C. V5A1S6, Canada
| | - Hengrui Cai
- School of Public Health, University of Alberta, Edmonton, AB T6G1C9, Canada
| | - Eric J. Chow
- Fred Hutchinson Cancer Research Center, Seattle Children's Hospital, University of Washington, Seattle, Washington, USA
| | - Gregory T. Armstrong
- Department of Epidemiology and Cancer Control, Division of Neuro-Oncology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 735, Memphis, TN 38105, USA
| |
Collapse
|
34
|
AKI biomarkers are poor discriminants for subsequent need for renal replacement therapy, but do not disqualify them yet. Intensive Care Med 2018; 44:1156-1158. [PMID: 29651499 DOI: 10.1007/s00134-018-5151-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 03/22/2018] [Indexed: 12/14/2022]
|
35
|
Van Der Pas S, Nelissen R, Fiocco M. Different competing risks models for different questions may give similar results in arthroplasty registers in the presence of few events. Acta Orthop 2018; 89:145-151. [PMID: 29388452 PMCID: PMC5901510 DOI: 10.1080/17453674.2018.1427314] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background and purpose - In arthroplasty registry studies, the analysis of time to revision is complicated by the competing risk of death. There are no clear guidelines for the choice between the 2 main adjusted analysis methods, cause-specific Cox and Fine-Gray regression, for orthopedic data. We investigated whether there are benefits, such as insight into different aspects of progression to revision, to using either 1 or both regression methods in arthroplasty registry studies in general, and specifically when the length of follow-up is short relative to the expected survival of the implants. Patients and methods - Cause-specific Cox regression and Fine-Gray regression were performed on total hip (138,234 hips, 124,560 patients) and knee (139,070 knees, 125,213 patients) replacement data from the Dutch Arthroplasty Register (median follow-up 3.1 years, maximum 8 years), with sex, age, ASA score, diagnosis, and type of fixation as explanatory variables. The similarity of the resulting hazard ratios and confidence intervals was assessed visually and by computing the relative differences of the resulting subdistribution and cause-specific hazard ratios. Results - The outcomes of the cause-specific Cox and Fine-Gray regressions were numerically very close. The largest relative difference between the hazard ratios was 3.5%. Interpretation - The most likely explanation for the similarity is that there are relatively few events (revisions and deaths), due to the short follow-up compared with the expected failure-free survival of the hip and knee prostheses. Despite the similarity, we recommend always performing both cause-specific Cox and Fine-Gray regression. In this way, both etiology and prediction can be investigated.
Collapse
Affiliation(s)
- Stéphanie Van Der Pas
- Mathematical Institute, Leiden University, Leiden, The Netherlands,Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands,Correspondence:
| | - Rob Nelissen
- Department of Orthopaedics, Leiden University Medical Center, Leiden, The Netherlands
| | - Marta Fiocco
- Mathematical Institute, Leiden University, Leiden, The Netherlands,Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
36
|
Palmerini L, Chiari L, Palumbo P. A Probabilistic Model to Investigate the Properties of Prognostic Tools for Falls. Methods Inf Med 2018; 54:189-97. [DOI: 10.3414/me13-01-0127] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 06/25/2014] [Indexed: 11/09/2022]
Abstract
SummaryBackground: Falls are a prevalent and burdensome problem in the elderly. Tools for the assessment of fall risk are fundamental for fall prevention. Clinical studies for the development and evaluation of prognostic tools for falls show high heterogeneity in the settings and in the reported results. Newly developed tools are susceptible to over- optimism.Objectives: This study proposes a probabilistic model to address critical issues about fall prediction through the analysis of the properties of an ideal prognostic tool for falls.Methods: The model assumes that falls occur within a population according to the Greenwood and Yule scheme for accident-proneness. Parameters for the fall rate distribution are estimated from counts of falls of four different epidemiological studies.Results: We obtained analytic formulas and quantitative estimates for the predictive and discriminative properties of the ideal prognostic tool. The area under the receiver operating characteristic curve (AUC) ranges between about 0.80 and 0.89 when prediction on any fall is made within a follow-up of one year. Predicting on multiple falls results in higher AUC.Conclusions: The discriminative ability of current validated prognostic tools for falls is sensibly lower than what the proposed ideal perfect tool achieves. A sensitivity analysis of the predictive and discriminative properties of the tool with respect to study settings and fall rate distribution identifies major factors that can account for the high heterogeneity of results observed in the literature.
Collapse
|
37
|
Baker SG. Simple Decision-Analytic Functions of the AUC for Ruling Out a Risk Prediction Model and an Added Predictor. Med Decis Making 2017; 38:225-234. [PMID: 29025299 DOI: 10.1177/0272989x17732994] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
BACKGROUND When using risk prediction models, an important consideration is weighing performance against the cost (monetary and harms) of ascertaining predictors. METHODS The minimum test tradeoff (MTT) for ruling out a model is the minimum number of all-predictor ascertainments per correct prediction to yield a positive overall expected utility. The MTT for ruling out an added predictor is the minimum number of added-predictor ascertainments per correct prediction to yield a positive overall expected utility. RESULTS An approximation to the MTT for ruling out a model is 1/[P (H(AUCmodel)], where H(AUC) = AUC - {½ (1-AUC)}½, AUC is the area under the receiver operating characteristic (ROC) curve, and P is the probability of the predicted event in the target population. An approximation to the MTT for ruling out an added predictor is 1 /[P {(H(AUCModel:2) - H(AUCModel:1 )], where Model 2 includes an added predictor relative to Model 1. LIMITATION The latter approximation requires the Tangent Condition that the true positive rate at the point on the ROC curve with a slope of 1 is larger for Model 2 than Model 1. CONCLUSION These approximations are suitable for back-of-the-envelope calculations. For example, in a study predicting the risk of invasive breast cancer, Model 2 adds to the predictors in Model 1 a set of 7 single nucleotide polymorphisms (SNPs). Based on the AUCs and the Tangent Condition, an MTT of 7200 was computed, which indicates that 7200 sets of SNPs are needed for every correct prediction of breast cancer to yield a positive overall expected utility. If ascertaining the SNPs costs $500, this MTT suggests that SNP ascertainment is not likely worthwhile for this risk prediction.
Collapse
Affiliation(s)
- Stuart G Baker
- Division of Cancer Prevention, National Cancer Institute, Bethesda, USA (SGB)
| |
Collapse
|
38
|
Devlin SM, Satagopan JM. Statistical Interactions from a Growth Curve Perspective. Hum Hered 2017; 82:21-36. [PMID: 28743105 DOI: 10.1159/000477125] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 04/27/2017] [Indexed: 01/01/2023] Open
Abstract
Logistic regression is widely used to evaluate the association between risk factors and a binary outcome. The logistic curve is symmetric around its point of inflection. Alternative families of curves, such as the additive Gompertz or Guerrero-Johnson models, have been proposed in various scenarios due to their asymmetry: disease risk may initially increase rapidly and be followed by a longer period where the rate of growth slowly decreases. When modeling binary outcomes in relation to risk factors, an additive logistic model may not provide a good fit to the data. Suppose the outcome and an additive function of the risk factors are indeed related through an asymmetric function, but we model the relationship using a logistic function. We illustrate - both from a mathematical framework and through a simulation-based evaluation - that higher-order terms, such as pairwise interactions and quadratic terms, may be required in a logistic regression model to obtain a good fit to the data. Importantly, as significant higher-order terms may be a manifestation of model misspecification, these terms should be cautiously interpreted; a more pragmatic approach is to develop contrasts of disease risk coming from a good fitting model. We illustrate these concepts in 2 cohort studies examining early death for late-stage colorectal and pancreatic cancer cases, and 2 case-control studies investigating NAT2 acetylation, smoking, and advanced colorectal adenoma and bladder cancer.
Collapse
Affiliation(s)
- Sean M Devlin
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | | |
Collapse
|
39
|
Hodgson LE, Dimitrov BD, Roderick PJ, Venn R, Forni LG. Predicting AKI in emergency admissions: an external validation study of the acute kidney injury prediction score (APS). BMJ Open 2017; 7:e013511. [PMID: 28274964 PMCID: PMC5353262 DOI: 10.1136/bmjopen-2016-013511] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVES Hospital-acquired acute kidney injury (HA-AKI) is associated with a high risk of mortality. Prediction models or rules may identify those most at risk of HA-AKI. This study externally validated one of the few clinical prediction rules (CPRs) derived in a general medicine cohort using clinical information and data from an acute hospitals electronic system on admission: the acute kidney injury prediction score (APS). DESIGN, SETTING AND PARTICIPANTS External validation in a single UK non-specialist acute hospital (2013-2015, 12 554 episodes); four cohorts: adult medical and general surgical populations, with and without a known preadmission baseline serum creatinine (SCr). METHODS Performance assessed by discrimination using area under the receiver operating characteristic curves (AUCROC) and calibration. RESULTS HA-AKI incidence within 7 days (kidney disease: improving global outcomes (KDIGO) change in SCr) was 8.1% (n=409) of medical patients with known baseline SCr, 6.6% (n=141) in those without a baseline, 4.9% (n=204) in surgical patients with baseline and 4% (n=49) in those without. Across the four cohorts AUCROC were: medical with known baseline 0.65 (95% CIs 0.62 to 0.67) and no baseline 0.71 (0.67 to 0.75), surgical with baseline 0.66 (0.62 to 0.70) and no baseline 0.68 (0.58 to 0.75). For calibration, in medicine and surgical cohorts with baseline SCr, Hosmer-Lemeshow p values were non-significant, suggesting acceptable calibration. In the medical cohort, at a cut-off of five points on the APS to predict HA-AKI, positive predictive value was 16% (13-18%) and negative predictive value 94% (93-94%). Of medical patients with HA-AKI, those with an APS ≥5 had a significantly increased risk of death (28% vs 18%, OR 1.8 (95% CI 1.1 to 2.9), p=0.015). CONCLUSIONS On external validation the APS on admission shows moderate discrimination and acceptable calibration to predict HA-AKI and may be useful as a severity marker when HA-AKI occurs. Harnessing linked data from primary care may be one way to achieve more accurate risk prediction.
Collapse
Affiliation(s)
- L E Hodgson
- Academic Unit of Primary Care and Population Sciences, Faculty of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK
- Anaesthetics Department, Western Sussex Hospitals NHS Foundation Trust, Worthing, UK
| | - B D Dimitrov
- Academic Unit of Primary Care and Population Sciences, Faculty of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK
| | - P J Roderick
- Academic Unit of Primary Care and Population Sciences, Faculty of Medicine, Southampton General Hospital, University of Southampton, Southampton, UK
| | - R Venn
- Anaesthetics Department, Western Sussex Hospitals NHS Foundation Trust, Worthing, UK
| | - L G Forni
- The Royal Surrey County Hospital NHS Foundation Trust, Guildford, UK
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| |
Collapse
|
40
|
Ghosh D. A modified risk set approach to biomarker evaluation studies. STATISTICS IN BIOSCIENCES 2016; 8:395-406. [PMID: 28989545 PMCID: PMC5627622 DOI: 10.1007/s12561-016-9166-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Revised: 07/01/2016] [Accepted: 08/12/2016] [Indexed: 10/21/2022]
Abstract
There is tremendous scientific and medical interest in the use of biomarkers to better facilitate medical decision making. In this article, we present a simple framework for assessing the predictive ability of a biomarker. The methodology requires use of techniques from a subfield of survival analysis termed semicompeting risks; results are presented to make the article self-contained. As we show in the article, one natural interpretation of semicompeting risks model is in terms of modifying the classical risk set approach to survival analysis that is more germane to medical decision making. A crucial parameter for evaluating biomarkers is the predictive hazard ratio, which is different from the usual hazard ratio from Cox regression models for right-censored data. This quantity will be defined; its estimation, inference and adjustment for covariates will be discussed. Aspects of causal inference related to these procedures will also be described. The methodology is illustrated with an evaluation of serum albumin in terms of predicting death in patients with primary biliary cirrhosis.
Collapse
Affiliation(s)
- Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, U.S.A
| |
Collapse
|
41
|
Abstract
Comparing diagnostic tests on accuracy alone can be inconclusive. For example, a test may have better sensitivity than another test yet worse specificity. Comparing tests on benefit risk may be more conclusive because clinical consequences of diagnostic error are considered. For benefit-risk evaluation, we propose diagnostic yield, the expected distribution of subjects with true positive, false positive, true negative, and false negative test results in a hypothetical population. We construct a table of diagnostic yield that includes the number of false positive subjects experiencing adverse consequences from unnecessary work-up. We then develop a decision theory for evaluating tests. The theory provides additional interpretation to quantities in the diagnostic yield table. It also indicates that the expected utility of a test relative to a perfect test is a weighted accuracy measure, the average of sensitivity and specificity weighted for prevalence and relative importance of false positive and false negative testing errors, also interpretable as the cost-benefit ratio of treating non-diseased and diseased subjects. We propose plots of diagnostic yield, weighted accuracy, and relative net benefit of tests as functions of prevalence or cost-benefit ratio. Concepts are illustrated with hypothetical screening tests for colorectal cancer with test positive subjects being referred to colonoscopy.
Collapse
Affiliation(s)
- Gene Pennello
- a Center for Devices and Radiological Health , Food and Drug Administration , Silver Spring , Maryland , USA
| | - Norberto Pantoja-Galicia
- a Center for Devices and Radiological Health , Food and Drug Administration , Silver Spring , Maryland , USA
| | - Scott Evans
- b Center for Biostatistics in AIDS Research and the Department of Biostatistics , Harvard T. H. Chan School of Public Health , Boston , Massachusetts , USA
| |
Collapse
|
42
|
Kim Y, Kong L. Improving Classification Accuracy by Combining Longitudinal Biomarker Measurements Subject to Detection Limits. Stat Biopharm Res 2016. [DOI: 10.1080/19466315.2016.1142889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
43
|
Evans SR, Pennello G, Pantoja-Galicia N, Jiang H, Hujer AM, Hujer KM, Manca C, Hill C, Jacobs MR, Chen L, Patel R, Kreiswirth BN, Bonomo RA. Benefit-risk Evaluation for Diagnostics: A Framework (BED-FRAME). Clin Infect Dis 2016; 63:812-7. [PMID: 27193750 DOI: 10.1093/cid/ciw329] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Accepted: 05/12/2016] [Indexed: 11/14/2022] Open
Abstract
The medical community needs systematic and pragmatic approaches for evaluating the benefit-risk trade-offs of diagnostics that assist in medical decision making. Benefit-Risk Evaluation of Diagnostics: A Framework (BED-FRAME) is a strategy for pragmatic evaluation of diagnostics designed to supplement traditional approaches. BED-FRAME evaluates diagnostic yield and addresses 2 key issues: (1) that diagnostic yield depends on prevalence, and (2) that different diagnostic errors carry different clinical consequences. As such, evaluating and comparing diagnostics depends on prevalence and the relative importance of potential errors. BED-FRAME provides a tool for communicating the expected clinical impact of diagnostic application and the expected trade-offs of diagnostic alternatives. BED-FRAME is a useful fundamental supplement to the standard analysis of diagnostic studies that will aid in clinical decision making.
Collapse
Affiliation(s)
- Scott R Evans
- Department of Biostatistics Center for Biostatistics in AIDS Research, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Gene Pennello
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - Norberto Pantoja-Galicia
- Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - Hongyu Jiang
- Center for Biostatistics in AIDS Research, Harvard T. H. Chan School of Public Health, Boston, Massachusetts
| | - Andrea M Hujer
- Louis Stokes Cleveland Veterans Affairs Medical Center, Case Western Reserve University School of Medicine, Ohio
| | - Kristine M Hujer
- Louis Stokes Cleveland Veterans Affairs Medical Center, Case Western Reserve University School of Medicine, Ohio
| | - Claudia Manca
- Public Health Research Institute, New Jersey Medical School, Rutgers University, Newark
| | - Carol Hill
- Duke Clinical Research Institute, Duke University, Durham, North Carolina
| | - Michael R Jacobs
- Louis Stokes Cleveland Veterans Affairs Medical Center, Case Western Reserve University School of Medicine, Ohio
| | - Liang Chen
- Public Health Research Institute, New Jersey Medical School, Rutgers University, Newark
| | | | - Barry N Kreiswirth
- Public Health Research Institute, New Jersey Medical School, Rutgers University, Newark
| | - Robert A Bonomo
- Louis Stokes Cleveland Veterans Affairs Medical Center, Case Western Reserve University School of Medicine, Ohio
| | | |
Collapse
|
44
|
Helmus L, Thornton D. The MATS-1 Risk Assessment Scale: Summary of Methodological Concerns and an Empirical Validation. SEXUAL ABUSE : A JOURNAL OF RESEARCH AND TREATMENT 2016; 28:160-186. [PMID: 24743657 DOI: 10.1177/1079063214529801] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Recently, Wollert and colleagues proposed that sex offender recidivism estimates should be stratified by age and they developed an age-stratified scale called the MATS-1 (Multisample Age-Stratified Table of Sexual Recidivism). The purpose of this article is to highlight concerns with the development of the MATS-1 and to validate the scale using 3,510 sex offenders from 14 unique samples. Concerns with the scale's development fall into three categories: approximations leading to considerable loss of precision, absence of appropriate statistical tests, and the use of inappropriate statistical techniques. The predictive accuracy of the MATS-1 (Area Under the Curve [AUC] = .663) was significantly lower than Static-99R (AUC = .708). The MATS-1 also significantly underestimated recidivism for some offenders. Both the relative and absolute predictive properties of the MATS-1 were not stable across samples. We conclude that the MATS-1 is not appropriate to use for applied risk assessment. Proposals are made for alternate ways to develop risk scales using the age-stratification method.
Collapse
|
45
|
Tang R, Pennello G. Validation of Prognostic Marker Tests: Statistical Lessons Learned From Regulatory Experience. Ther Innov Regul Sci 2016; 50:241-252. [DOI: 10.1177/2168479015601721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
46
|
Vistisen D, Andersen GS, Hansen CS, Hulman A, Henriksen JE, Bech-Nielsen H, Jørgensen ME. Prediction of First Cardiovascular Disease Event in Type 1 Diabetes Mellitus: The Steno Type 1 Risk Engine. Circulation 2016; 133:1058-66. [PMID: 26888765 DOI: 10.1161/circulationaha.115.018844] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Accepted: 01/21/2016] [Indexed: 02/06/2023]
Abstract
BACKGROUND Patients with type 1 diabetes mellitus are at increased risk of developing cardiovascular disease (CVD), but they are currently undertreated. There are no risk scores used on a regular basis in clinical practice for assessing the risk of CVD in type 1 diabetes mellitus. METHODS AND RESULTS From 4306 clinically diagnosed adult patients with type 1 diabetes mellitus, we developed a prediction model for estimating the risk of first fatal or nonfatal CVD event (ischemic heart disease, ischemic stroke, heart failure, and peripheral artery disease). Detailed clinical data including lifestyle factors were linked to event data from validated national registers. The risk prediction model was developed by using a 2-stage approach. First, a nonparametric, data-driven approach was used to identify potentially informative risk factors and interactions (random forest and survival tree analysis). Second, based on results from the first step, Poisson regression analysis was used to derive the final model. The final CVD prediction model was externally validated in a different population of 2119 patients with type 1 diabetes mellitus. During a median follow-up of 6.8 years (interquartile range, 2.9-10.9) a total of 793 (18.4%) patients developed CVD. The final prediction model included age, sex, diabetes duration, systolic blood pressure, low-density lipoprotein cholesterol, hemoglobin A1c, albuminuria, glomerular filtration rate, smoking, and exercise. Discrimination was excellent for a 5-year CVD event with a C-statistic of 0.826 (95% confidence interval, 0.807-0.845) in the derivation data and a C-statistic of 0.803 (95% confidence interval, 0.767-0.839) in the validation data. The Hosmer-Lemeshow test showed good calibration (P>0.05) in both cohorts. CONCLUSIONS This high-performing CVD risk model allows for the implementation of decision rules in a clinical setting.
Collapse
Affiliation(s)
- Dorte Vistisen
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.).
| | - Gregers Stig Andersen
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.)
| | - Christian Stevns Hansen
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.)
| | - Adam Hulman
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.)
| | - Jan Erik Henriksen
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.)
| | - Henning Bech-Nielsen
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.)
| | - Marit Eika Jørgensen
- From Steno Diabetes Center, Gentofte, Denmark (D.V., G.S.A., C.S.H., M.E.J.); Department of Public Health, Aarhus University, Denmark (A.H.); Danish Diabetes Academy, Odense, Denmark (A.H., H.B.-N.); Odense University Hospital, Denmark (J.E.H., H.B.-N.); and University of Southern Denmark, Copenhagen (H.B.-N.)
| |
Collapse
|
47
|
Kim S, Albert PS. A class of joint models for multivariate longitudinal measurements and a binary event. Biometrics 2016; 72:917-25. [PMID: 26753988 DOI: 10.1111/biom.12463] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 11/01/2015] [Accepted: 11/01/2015] [Indexed: 11/27/2022]
Abstract
Predicting binary events such as newborns with large birthweight is important for obstetricians in their attempt to reduce both maternal and fetal morbidity and mortality. Such predictions have been a challenge in obstetric practice, where longitudinal ultrasound measurements taken at multiple gestational times during pregnancy may be useful for predicting various poor pregnancy outcomes. The focus of this article is on developing a flexible class of joint models for the multivariate longitudinal ultrasound measurements that can be used for predicting a binary event at birth. A skewed multivariate random effects model is proposed for the ultrasound measurements, and the skewed generalized t-link is assumed for the link function relating the binary event and the underlying longitudinal processes. We consider a shared random effect to link the two processes together. Markov chain Monte Carlo sampling is used to carry out Bayesian posterior computation. Several variations of the proposed model are considered and compared via the deviance information criterion, the logarithm of pseudomarginal likelihood, and with a training-test set prediction paradigm. The proposed methodology is illustrated with data from the NICHD Successive Small-for-Gestational-Age Births study, a large prospective fetal growth cohort conducted in Norway and Sweden.
Collapse
Affiliation(s)
- Sungduk Kim
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, Maryland, U.S.A..
| | - Paul S Albert
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, Maryland, U.S.A
| |
Collapse
|
48
|
Ng KH, Lau S. Vision 20/20: Mammographic breast density and its clinical applications. Med Phys 2015; 42:7059-77. [PMID: 26632060 DOI: 10.1118/1.4935141] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Kwan-Hoong Ng
- Department of Biomedical Imaging and University of Malaya Research Imaging Centre, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Susie Lau
- Department of Biomedical Imaging and University of Malaya Research Imaging Centre, Faculty of Medicine, University of Malaya, 50603 Kuala Lumpur, Malaysia
| |
Collapse
|
49
|
Gail MH, Pfeiffer RM. Is the Benign Breast Disease Breast Cancer Model Well Calibrated? J Clin Oncol 2015. [DOI: 10.1200/jco.2015.61.6177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
50
|
Baker SG, Kramer BS. Evaluating surrogate endpoints, prognostic markers, and predictive markers: Some simple themes. Clin Trials 2015; 12:299-308. [PMID: 25385934 PMCID: PMC4451440 DOI: 10.1177/1740774514557725] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
BACKGROUND A surrogate endpoint is an endpoint observed earlier than the true endpoint (a health outcome) that is used to draw conclusions about the effect of treatment on the unobserved true endpoint. A prognostic marker is a marker for predicting the risk of an event given a control treatment; it informs treatment decisions when there is information on anticipated benefits and harms of a new treatment applied to persons at high risk. A predictive marker is a marker for predicting the effect of treatment on outcome in a subgroup of patients or study participants; it provides more rigorous information for treatment selection than a prognostic marker when it is based on estimated treatment effects in a randomized trial. METHODS We organized our discussion around a different theme for each topic. RESULTS "Fundamentally an extrapolation" refers to the non-statistical considerations and assumptions needed when using surrogate endpoints to evaluate a new treatment. "Decision analysis to the rescue" refers to use the use of decision analysis to evaluate an additional prognostic marker because it is not possible to choose between purely statistical measures of marker performance. "The appeal of simplicity" refers to a straightforward and efficient use of a single randomized trial to evaluate overall treatment effect and treatment effect within subgroups using predictive markers. CONCLUSION The simple themes provide a general guideline for evaluation of surrogate endpoints, prognostic markers, and predictive markers.
Collapse
Affiliation(s)
- Stuart G Baker
- Division of Cancer Prevention, National Cancer Institute, Bethesda MD, USA
| | - Barnett S Kramer
- Division of Cancer Prevention, National Cancer Institute, Bethesda MD, USA
| |
Collapse
|