Retrospective Study Open Access
Copyright ©The Author(s) 2020. Published by Baishideng Publishing Group Inc. All rights reserved.
Artif Intell Gastroenterol. Jul 28, 2020; 1(1): 30-36
Published online Jul 28, 2020. doi: 10.35712/aig.v1.i1.30
Machine learning better predicts colonoscopy duration
Alexander Joseph Podboy, Division of Gastroenterology and Hepatology, Stanford University School of Medicine, Stanford, CA 94305, United States
David Scheinker, Department of Management Science and Engineering, Stanford University School of Engineering, Stanford, CA 94305, United States
David Scheinker, Department of Preoperative Services, Lucile Packard Children's Hospital Stanford, Stanford, CA 94304, United States
ORCID number: Alexander Joseph Podboy (0000-0001-9353-4965); David Scheinker (0000-0001-5885-8024).
Author contributions: All authors equally contributed to this paper, in regards to conception and design of the study, literature review and analysis, drafting and critical revision and editing, and final approval of the final version.
Institutional review board statement: This research was approved by the institutional review board at Stanford University.
Informed consent statement: The informed consent was waived.
Conflict-of-interest statement: Scheinker D serves as an advisor to Carta Healthcare - a healthcare analytics company. No other potential conflicts of interest or financial support to disclose.
Data sharing statement: No additional data are available.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Corresponding author: Alexander Joseph Podboy, MD, Academic Fellow, Doctor, Division of Gastroenterology and Hepatology, Stanford University School of Medicine, 300 Pasteur Drive, Stanford, CA 94305, United States. alexander.podboy@gmail.com
Received: April 23, 2020
Peer-review started: April 23, 2020
First decision: June 4, 2020
Revised: June 15, 2020
Accepted: June 17, 2020
Article in press: June 17, 2020
Published online: July 28, 2020

Abstract
BACKGROUND

The use of machine learning (ML) to predict colonoscopy procedure duration has not been examined.

AIM

To assess if ML and data available at the time a colonoscopy procedure is scheduled could be used to estimate procedure duration more accurately than the current practice.

METHODS

Total 40168 colonoscopies from the Clinical Outcomes Research Initiative database were collected. ML models predicting procedure duration were developed using data available at time of scheduling. The top performing model was compared against historical practice. Models were evaluated based on accuracy (prediction – actual time) ± 5, 10, and 15 min.

RESULTS

ML outperformed historical practice with 77.1% to 68.9%, 87.3% to 79.6%, and 92.1% to 86.8% accuracy at 5, 10 and 15 min thresholds.

CONCLUSION

The use of ML to estimate colonoscopy procedure duration may lead to more accurate scheduling.

Key Words: Machine Learning, Colonoscopy, Endoscopy, Artificial intelligence, Practice outcomes, Operations

Core tip: Machine learning has been utilized to predict surgical procedure duration and enhance operating room proficiency, however its usefulness for predicting colonoscopy procedure duration has not been examined. Procedure duration predictions from a machine learning algorithm trained on data from the Clinical Outcomes Research Initiative database outperformed historical practice.



INTRODUCTION

Current colonoscopy scheduling models utilize either historical averages or predetermined time allotments (usually 30-45 min). Scheduling has not evolved to incorporate patient information, case complexity, procedure environment, or operator proficiency. Failure to assess for these variables can lead to significant misjudgments of procedural duration. These errors can result in both under- and overutilization of endoscopy room time leading to increased cost, misappropriation of endoscopy resources, delays, and decreases to patient and provider satisfaction[1]. Machine learning (ML) has been utilized to predict surgical procedure duration and enhance operating room proficiency, however its usefulness for predicting colonoscopy procedure duration has not been examined[2,3].

Our aim was to assess if ML and data available at the time a colonoscopy procedure is scheduled could be used to estimate procedure duration more accurately than the current practice.

MATERIALS AND METHODS

The Clinical Outcomes Research Initiative (CORIv.4) database was queried for all colonoscopies with complete procedural duration times from 2008-2014 following approval from our institutional review board.

The CORI database is a national central repository of endoscopic procedures from a physician network of academic, community and veteran administration hospitals/practices. The details of the repository can be found in previous publications[4]. ML models were trained on variables with < 20% missing values and variables available prior to the procedure. Procedures with duration < 5 and > 280 min were excluded. All statistical analyses were performed in R-studio version 3.5.3 (Boston, Massachusetts). 80% of the cases were used for training data and the remaining 20% used to compare the performance of these models. To reduce skew in the data, the target variable (procedural duration), was logarithmically transformed in line with previous publications[3,5].

Following established methodology[3,5,6], several models were tuned to predict procedure-time duration using cross-validation. The various models included random forest, gradient boosting machine, least absolute shrinkage and selection operator or LASSO, and extreme gradient boosting models (xgboost). The best performing model was selected based on lowest root mean squared error of the model and trained using historical data (2008-2013) to predict “current” data (2014). Predictions derived from the best performing model were compared with the current standard of using historical means. Models were evaluated based on accuracy (prediction – actual time) within thresholds of 5, 10, and 15 min to account for operational considerations.

RESULTS

Total of 40168 colonoscopies from 75 different sites from 2008 to 2014 with procedural duration information were obtained. 32136 (80%) of the cases were used for training the algorithm, with the remaining 8032 (20%) used to compare the performance of these models. A total of five patient (age, gender, race, ASA class, pediatric status), eight provider (endoscopist ID, degree of performing provider, degree year of performing provider, specialty of provider, gender and race/ethnicity of the provider, fellow involvement) and twelve procedure specific [(procedure year, procedure order, site ID, site type (University vs Community), location of procedure/facility type, duration of procedure, primary indication of procedure, depth intended of the procedure, sedation type used, state, and region)] variables were all selected for model analysis and training.

Table 1 demonstrates background characteristics of the final cohort. The best performing machine learning algorithm was the xgboost model. Figure 1 depicts the final models accuracy. The percentages of procedures for which the xgboost and the historical models generated forecasts within the 5, 10 and 15 min threshold were 77.1% vs 68.9%, 87.3% vs 79.6%, and 92.1% vs 86.8% (P < 0.001). The most important features of the model were: Patient age, procedure year, and the degree year of provider year (Figure 2).

Table 1 Cohort background characteristics.
Demographic information
Total patients40168
Mean age58.95
SexFemale17682
Male22485 (56.0%)
ASA ClassI7071
II27699
III5237
IV158
V3
RaceCaucasian32031
Hispanic2219
Black2193
Asian1140
Native American679
Other1906
Procedural information
Median procedure year2012(2008-2014)
Total No. of sites75
Fellow involved3575
Indication for procedureAverage risk screening12687
Surveillance of adenomatous polyps8213
Hematochezia3795
High risk screening3272
Anemia1508
Diarrhea1469
Other9224
Procedure order1st37864
2nd2056
Other248
Mean duration of procedure23.4 min
Depth intendedCecum31745
Terminal Ileum6798
Ascending colon570
Ileum424
Anastomosis site447
Other163
Location of the procedureHospital endoscopy suite15589
Ambulatory surgery center14730
unknown5739
Office2501
Endoscopy suite1450
ICU88
RegionNorth Central3490
Northeast11156
Northwest12329
South Central776
South East1466
South West10947
Site typeCommunity25133
HMO1000
University5676
VA8359
SedationNone241
Moderate/Conscious sedation28009
“Deep” Sedation7289
General Anesthesia2510
Anxiolytic Sedation78
Provider information
Gender of providerFemale9881
Male30287
Median degree year of provider1989(1962-2009)
Degree of performing providerDO1253
MD38851
PA64
Provider specialtyGastroenterology33059
Surgery2976
Colorectal surgery995
Internal medicine1589
Family medicine581
Other968
Ethnicity of providerHispanic419
Non-hispanic37148
Figure 1
Figure 1 Accuracy of machine learning model vs historical average.
Figure 2
Figure 2 Feature importance of machine learning model.
DISCUSSION

We demonstrated that machine learning predicts colonoscopy procedure duration more accurately than the currently accepted standard practice and the improvement was greater as the tolerance for error decreased.

Our results mimic similar applications of machine learning algorithms. Bartek et al[6] compared the standard practice of using average historical procedure duration and surgeon estimates of procedural duration compared to predictions derived from a machine learning model. Using a 10% accuracy threshold, the machine learning algorithm outperformed both traditional practices (39% ML vs 32% surgeon derived and 30% historical means). In an analysis of feature importance, the authors noted that fundamental case information, such as mean duration of the last ten procedures, was the most important predictive feature, with patient health metrics having a smaller total impact. However, our results suggest that patient specific factors may play a greater role in determining colonoscopy procedure duration. While again provider and procedural factors demonstrated high importance, patient specific factors (such as age, female sex) factored substantially into our model’s final predictions.

There are several strengths to our analysis. A large number of colonoscopies from a national repository of endoscopic procedures composed of a wide array of procedures, patients, and providers from an assortment of practice environments were analyzed. Inclusion of a national database increases generalizability by limiting regional or practice related biases.

However, there are several limitations to our analysis. Procedure reporting to the CORI database is voluntary and there may be an inherent selection bias in which easier colonoscopies were more likely to be reported to the database. This is supported by the relatively short overall procedural duration in our cohort. While the effects of a longer average procedure duration on our model are unknown, we anticipate more resiliency to increased error in the ML model compared to historical means, further enhancing the overall accuracy of the model compared to traditional practice.

While the algorithm was successful, it largely represents a rudimentary proof of concept option. Several variables that have been associated with difficult or lengthy colonoscopies in previous reports[7] and were either not available or too incomplete in this current data set to allow for inclusion into our analysis. Addition of variables associated with difficult colonoscopies including body mass index, previous abdominal or pelvic surgeries, bowel habits, weight, height etc. would potentially improve the models accuracy.

The use of an algorithm trained on prospectively collected data with greater provider, environmental, patient, and procedural information may lead to improvements in colonoscopy procedure scheduling. Such improvements may contribute to improved efficiency, patient and provider satisfaction, and reduced costs. Further study is necessary to examine the implications of the deployment of such a model in a clinical setting, and assess if such models can be used in other gastrointestinal procedures.

ARTICLE HIGHLIGHTS
Research background

The usefulness of machine learning (ML) for predicting colonoscopy procedure duration has not been examined.

Research motivation

A ML algorithm trained on endoscopic data derived from the Clinical Outcomes Research Initiative database predicted colonoscopy procedure duration more accurately than the currently accepted standard practice and the improvement was greater as the tolerance for error decreased.

Research objectives

The aim of this study was to assess if ML and data available at the time a colonoscopy procedure is scheduled could be used to estimate procedure duration more accurately than the current practice.

Research methods

Total 40168 colonoscopies were collected. ML models predicting procedure duration were developed using data available at time of scheduling. The top performing model was compared against historical practice.

Research results

ML outperformed historical practice with 77.1% to 68.9%, 87.3% to 79.6%, and 92.1% to 86.8% accuracy at 5, 10 and 15 min thresholds, and the most important features of the model were: patient age, procedure year, and the degree year of provider year.

Research conclusions

The use of ML to estimate colonoscopy procedure duration may lead to more accurate scheduling.

Research perspectives

Further study is necessary to examine the implications of the deployment of such a model in a clinical setting, and assess if such models can be used in other gastrointestinal procedures.

Footnotes

Manuscript source: Unsolicited manuscript

Specialty type: Gastroenterology and Hepatology

Country/Territory of origin: United States

Peer-review report’s scientific quality classification

Grade A (Excellent): 0

Grade B (Very good): 0

Grade C (Good): C

Grade D (Fair): 0

Grade E (Poor): 0

P-Reviewer: Mohamed SY S-Editor: Wang JL L-Editor: A E-Editor: Ma YJ

References
1.  Almeida R, Paterson WG, Craig N, Hookey L. A Patient Flow Analysis: Identification of Process Inefficiencies and Workflow Metrics at an Ambulatory Endoscopy Unit. Can J Gastroenterol Hepatol. 2016;2016:2574076.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 12]  [Cited by in F6Publishing: 13]  [Article Influence: 1.6]  [Reference Citation Analysis (0)]
2.  Stepaniak PS, Heij C, Mannaerts GH, de Quelerij M, de Vries G. Modeling procedure and surgical times for current procedural terminology-anesthesia-surgeon combinations and evaluation in terms of case-duration prediction and operating room efficiency: a multicenter study. Anesth Analg. 2009;109:1232-1245.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 88]  [Cited by in F6Publishing: 89]  [Article Influence: 5.9]  [Reference Citation Analysis (0)]
3.  Master N, Zhou Z, Miller D, Scheinker D, Bambos N, Glynn P. Improving predictions of pediatric surgical durations with supervised learning. Int J Data Sci Anal. 2017;4:33-52.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 22]  [Cited by in F6Publishing: 9]  [Article Influence: 1.3]  [Reference Citation Analysis (0)]
4.  Holub JL, Morris C, Fagnan LJ, Logan JR, Michaels LC, Lieberman DA. Quality of Colonoscopy Performed in Rural Practice: Experience From the Clinical Outcomes Research Initiative and the Oregon Rural Practice-Based Research Network. J Rural Health. 2018;34 Suppl 1:s75-s83.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Cited by in F6Publishing: 6]  [Article Influence: 0.9]  [Reference Citation Analysis (0)]
5.  Scheinker D, Valencia A, Rodriguez F. Identification of Factors Associated With Variation in US County-Level Obesity Prevalence Rates Using Epidemiologic vs Machine Learning Models. JAMA Netw Open. 2019;2:e192884.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 26]  [Cited by in F6Publishing: 24]  [Article Influence: 4.8]  [Reference Citation Analysis (0)]
6.  Bartek MA, Saxena RC, Solomon S, Fong CT, Behara LD, Venigandla R, Velagapudi K, Lang JD, Nair BG. Improving Operating Room Efficiency: Machine Learning Approach to Predict Case-Time Duration. J Am Coll Surg. 2019;229:346-354.e3.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 39]  [Cited by in F6Publishing: 43]  [Article Influence: 8.6]  [Reference Citation Analysis (0)]
7.  Anderson JC, Messina CR, Cohn W, Gottfried E, Ingber S, Bernstein G, Coman E, Polito J. Factors predictive of difficult colonoscopy. Gastrointest Endosc. 2001;54:558-562.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 165]  [Cited by in F6Publishing: 181]  [Article Influence: 7.9]  [Reference Citation Analysis (0)]