Retrospective Cohort Study
Copyright ©The Author(s) 2019. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Crit Care Med. Nov 19, 2019; 8(7): 120-126
Published online Nov 19, 2019. doi: 10.5492/wjccm.v8.i7.120
Machine learning in data abstraction: A computable phenotype for sepsis and septic shock diagnosis in the intensive care unit
Prabij Dhungana, Laura Piccolo Serafim, Arnaldo Lopez Ruiz, Danette Bruns, Timothy J Weister, Nathan Jerome Smischney, Rahul Kashyap
Prabij Dhungana, Nathan Jerome Smischney, Rahul Kashyap, Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, MN 55905, United States
Prabij Dhungana, Laura Piccolo Serafim, Arnaldo Lopez Ruiz, Nathan Jerome Smischney, Rahul Kashyap, Multidisciplinary Epidemiology and Translational Research in Intensive Care, Mayo Clinic, Rochester, MN 55905, United States
Laura Piccolo Serafim, Arnaldo Lopez Ruiz, Department of Medicine, Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN 55905, United States
Danette Bruns, Timothy J Weister, Anesthesia Clinical Research Unit, Mayo Clinic, MN 55905, United States
Author contributions: All listed authors provided intellectual contribution and made critical revisions of this paper; Kashyap R, Lopes Ruiz A and Smischney NJ contributed to study conception and design; Dhungana P, Piccolo Serafim L, BrunsD and Weister TJ contributed to data acquisition; Dhungana P, Piccolo Serafim L, Smischney NJ and Kashyap R contributed to data analysis; all authors approved the final version of the manuscript.
Institutional review board statement: The study was reviewed and approved by the Mayo Clinic Institutional Review Board.
Informed consent statement: Retrospective study was exempt from need for informed consent.
Conflict-of-interest statement: Authors declare no conflict of interests for this article.
STROBE statement: The authors have read the STROBE Statement-checklist of items, and the manuscript was prepared and revised according to the STROBE Statement-checklist of items.
Open-Access: This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Corresponding author: Rahul Kashyap, MBBS, Assistant Professor, MBA, Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States. kashyap.rahul@mayo.edu
Telephone: +1-507-2557196
Received: April 23, 2019
Peer-review started: May 8, 2019
First decision: August 2, 2019
Revised: August 21, 2019
Accepted: October 27, 2019
Article in press: October 27, 2019
Published online: November 19, 2019
Abstract
BACKGROUND

With the recent change in the definition (Sepsis-3 Definition) of sepsis and septic shock, an electronic search algorithm was required to identify the cases for data automation. This supervised machine learning method would help screen a large amount of electronic medical records (EMR) for efficient research purposes.

AIM

To develop and validate a computable phenotype via supervised machine learning method for retrospectively identifying sepsis and septic shock in critical care patients.

METHODS

A supervised machine learning method was developed based on culture orders, Sequential Organ Failure Assessment (SOFA) scores, serum lactate levels and vasopressor use in the intensive care units (ICUs). The computable phenotype was derived from a retrospective analysis of a random cohort of 100 patients admitted to the medical ICU. This was then validated in an independent cohort of 100 patients. We compared the results from computable phenotype to a gold standard by manual review of EMR by 2 blinded reviewers. Disagreement was resolved by a critical care clinician. A SOFA score ≥ 2 during the ICU stay with a culture 72 h before or after the time of admission was identified. Sepsis versions as V1 was defined as blood cultures with SOFA ≥ 2 and Sepsis V2 was defined as any culture with SOFA score ≥ 2. A serum lactate level ≥ 2 mmol/L from 24 h before admission till their stay in the ICU and vasopressor use with Sepsis-1 and-2 were identified as Septic Shock-V1 and-V2 respectively.

RESULTS

In the derivation subset of 100 random patients, the final machine learning strategy achieved a sensitivity-specificity of 100% and 84% for Sepsis-1, 100% and 95% for Sepsis-2, 78% and 80% for Septic Shock-1, and 80% and 90% for Septic Shock-2. An overall percent of agreement between two blinded reviewers had a k = 0.86 and 0.90 for Sepsis 2 and Septic shock 2 respectively. In validation of the algorithm through a separate 100 random patient subset, the reported sensitivity and specificity for all 4 diagnoses were 100%-100% each.

CONCLUSION

Supervised machine learning for identification of sepsis and septic shock is reliable and an efficient alternative to manual chart review.

Keywords: Machine learning, Computable phenotype, Critical care, Sepsis, Septic shock

Core tip: This study presents and validates a supervised machine learning model for the identification of sepsis and septic shock cases using electronic medical records as an alternative to manual chart review. This method showed to be an efficient, fast and reliable option for retrospective data abstraction, with the potential to be applied to other clinical conditions.