Evaluating the accuracy and reproducibility of ChatGPT-4 in answering patient questions related to small intestinal bacterial overgrowth

doi:10.35712/aig.v5.i1.90503

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 5, Issue 1

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (6031)

All Articles published online

The chart showing PDF series, HTML series, Tables (1-2) series.

Item

Count

PDF

125

HTML

3439

Tables (1-2)

518

Sum=4082

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

323

Download

895

Sum=1218

Publishing Process of This Article

Item

Count

Browse

105

Download

475

Sum=580

Apr 30, 2024 (publication date) through Aug 31, 2025

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

Artificial Intelligence in Gastroenterology

ISSN

2644-3236

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Observational Study

Artif Intell Gastroenterol. Apr 30, 2024; 5(1): 90503
Published online Apr 30, 2024. doi: 10.35712/aig.v5.i1.90503

Evaluating the accuracy and reproducibility of ChatGPT-4 in answering patient questions related to small intestinal bacterial overgrowth

Lauren Schlussel, Jamil S Samaan, Yin Chan, Bianca Chang, Yee Hui Yeo, Wee Han Ng, Ali Rezaie

Lauren Schlussel, Jamil S Samaan, Yin Chan, Bianca Chang, Yee Hui Yeo, Ali Rezaie, Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States

Wee Han Ng, Bristol Medical School, University of Bristol, BS8 1TH, Bristol, United Kingdom

Ali Rezaie, Medically Associated Science and Technology Program, Cedars-Sinai Medical Center, Los Angeles, CA 90048, United States

Author contributions: Rezaie A was the guarantor, participated in the acquisition, analysis, and interpretation of the data, and revised the article for critically important intellectual content; Schlussel L drafted the initial manuscript and participated in the acquisition, analysis, and interpretation of the data; Samaan J designed the study and revised the article for critically important intellectual content; Chan Y and Chang B participated in the acquisition, analysis, and interpretation of the data, and revised the article for critically important intellectual content; Yeo YH revised the article for critically important intellectual content; Ng WH participated in the acquisition, analysis, and interpretation of the data.

Institutional review board statement: Our study did not require IRB approval, given our research does not involve human subjects.

Informed consent statement: Our research does not involve human subjects. Therefore, no signed informed consent or documents were obtained.

Conflict-of-interest statement: All the authors declare that they have no conflict of interest.

Data sharing statement: No additional data are available.

STROBE statement: The authors have read the STROBE Statement – checklist of items, and the manuscript was prepared and revised according to the STROBE Statement – checklist of items.

Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/

Corresponding author: Ali Rezaie, MD, MSc, FRCPC, Medical Director, Medically Associated Science and Technology Program, Cedars-Sinai Medical Center, Cedars-Sinai, 8730 Alden Drive, Thalians Bldg, #E240, Los Angeles, CA 90048, United States. Ali.rezaie@cshs.org

Received: December 7, 2023
Revised: March 27, 2024
Accepted: April 16, 2024
Published online: April 30, 2024
Processing time: 144 Days and 2.3 Hours

Abstract

BACKGROUND

Small intestinal bacterial overgrowth (SIBO) poses diagnostic and treatment challenges due to its complex management and evolving guidelines. Patients often seek online information related to their health, prompting interest in large language models, like GPT-4, as potential sources of patient education.

AIM

To investigate ChatGPT-4's accuracy and reproducibility in responding to patient questions related to SIBO.

METHODS

A total of 27 patient questions related to SIBO were curated from professional societies, Facebook groups, and Reddit threads. Each question was entered into GPT-4 twice on separate days to examine reproducibility of accuracy on separate occasions. GPT-4 generated responses were independently evaluated for accuracy and reproducibility by two motility fellowship-trained gastroenterologists. A third senior fellowship-trained gastroenterologist resolved disagreements. Accuracy of responses were graded using the scale: (1) Comprehensive; (2) Correct but inadequate; (3) Some correct and some incorrect; or (4) Completely incorrect. Two responses were generated for every question to evaluate reproducibility in accuracy.

RESULTS

In evaluating GPT-4's effectiveness at answering SIBO-related questions, it provided responses with correct information to 18/27 (66.7%) of questions, with 16/27 (59.3%) of responses graded as comprehensive and 2/27 (7.4%) responses graded as correct but inadequate. The model provided responses with incorrect information to 9/27 (33.3%) of questions, with 4/27 (14.8%) of responses graded as completely incorrect and 5/27 (18.5%) of responses graded as mixed correct and incorrect data. Accuracy varied by question category, with questions related to “basic knowledge” achieving the highest proportion of comprehensive responses (90%) and no incorrect responses. On the other hand, the “treatment” related questions yielded the lowest proportion of comprehensive responses (33.3%) and highest percent of completely incorrect responses (33.3%). A total of 77.8% of questions yielded reproducible responses.

CONCLUSION

Though GPT-4 shows promise as a supplementary tool for SIBO-related patient education, the model requires further refinement and validation in subsequent iterations prior to its integration into patient care.

Keywords: Small intestinal bacterial overgrowth; Motility; Artificial intelligence; Chat-GPT; Large language models; Patient education

Core Tip: ChatGPT-4 demonstrates promise in enhancing patient understanding of basic concepts related to small intestinal bacterial overgrowth (SIBO). However, it exhibits limitations in accurately addressing questions about the diagnosis and treatment of SIBO, which are areas where up-to-date medical guidance is crucial. As such, artificial intelligence can be beneficial for general patient education but should not replace professional medical advice, especially for conditions with complex care protocols. Continuous refinement and updating of Chat-GPT’s knowledge are essential for its safe and effective application in healthcare. Rigorous scrutiny of artificial intelligence-generated content is imperative to prevent the dissemination of potentially harmful misinformation.