Systematic Reviews Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Jun 21, 2025; 31(23): 106836
Published online Jun 21, 2025. doi: 10.3748/wjg.v31.i23.106836
Diagnostic accuracy and quality of artificial intelligence models in irritable bowel syndrome: A systematic review
Akshaya Srikanth Bhagavathula, Department of Public Health, College of Health and Human Sciences, North Dakota State University, Fargo, ND 58102, United States
Ahmed Mourtada Al Qady, Division of Gastroenterology, Hepatology and Nutrition, University of Florida, Gainesville, FL 32607, United States
Wafa A Aldhaleei, Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN 55905, United States
ORCID number: Akshaya Srikanth Bhagavathula (0000-0002-0581-7808); Ahmed Mourtada Al Qady (0000-0002-7354-0150); Wafa A Aldhaleei (0000-0003-3967-9658).
Author contributions: Bhagavathula AS made conceptualization and validation; Bhagavathula AS and Aldhaleei WA contributed to methodology, data curation, and review and edit the manuscript; Aldhaleei WA contributed to supervision; Al Qady AM and Aldhaleei WA contributed to writing original draft. All authors contributed to investigation and approved the final manuscript.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
PRISMA 2009 Checklist statement: The authors have read the PRISMA 2009 Checklist, and the manuscript was prepared and revised according to the PRISMA 2009 Checklist.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Akshaya Srikanth Bhagavathula, PhD, Associate Professor, Department of Public Health, College of Health and Human Sciences, North Dakota State University, No. 1455 14th Avenue North, Fargo, ND 58102, United States. akshaya.bhagavathula@ndsu.edu
Received: March 12, 2025
Revised: April 21, 2025
Accepted: May 30, 2025
Published online: June 21, 2025
Processing time: 100 Days and 16.1 Hours

Abstract
BACKGROUND

Irritable bowel syndrome (IBS) affects approximately 9%-12% of the global population, presenting substantial diagnostic challenges due to symptom subjectivity and lack of definitive biomarkers.

AIM

To systematically examine the diagnostic accuracy of artificial intelligence (AI) models applied to various biomarkers in IBS diagnosis.

METHODS

A comprehensive search of six databases identified 18053 articles published up to May 31, 2024. Following screening and eligibility criteria, six observational studies involving 1366 participants from the United Kingdom, China, and Japan were included. Risk of bias and reporting quality were assessed using quality assessment of diagnostic accuracy studies-2, prediction model risk of bias assessment tool-AI, and transparent reporting of a multivariable prediction model for individual prognosis or diagnosis-AI tools. Key metrics included sensitivity, specificity, accuracy, and area under the curve (AUC).

RESULTS

The included studies applied AI models such as random forests, support vector machines, and neural networks to biomarkers like fecal microbiome composition, gas chromatography data, neuroimaging features, and protease activity. Diagnostic accuracy ranged from 54% to 98% (AUC: 0.61-0.99). Models using fecal microbiome data achieved the highest performance, with one study reporting 98% sensitivity and specificity (AUC = 0.99). While most studies demonstrated high methodological quality, significant variability in datasets, biomarkers, and validation methods limited meta-analysis feasibility and generalizability.

CONCLUSION

AI models show potential to improve IBS diagnostic accuracy by integrating complex biomarkers which will aid the development of algorithms to direct treatment strategies. However, methodological inconsistencies and limited population diversity underscore the need for standardized protocols and external validation to ensure clinical applicability.

Key Words: Artificial intelligence; Machine learning; Irritable bowel syndrome; Diagnosis; Systematic review

Core Tip: This study highlights the transformative potential of artificial intelligence (AI) in irritable bowel syndrome diagnosis by leveraging complex biomarkers such as fecal microbiome composition and neuroimaging features. By systematically evaluating the performance of various AI models, it reveals both their strengths and limitations, with some achieving near-perfect accuracy. However, significant variability in study methodologies and dataset heterogeneity pose challenges to clinical implementation. The findings emphasize the need for standardized validation protocols to enhance reproducibility and real-world applicability. As AI continues to evolve, its integration into irritable bowel syndrome diagnostics could refine precision medicine approaches, offering a data-driven alternative to current symptom-based diagnostic criteria.



INTRODUCTION

Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder characterized by chronic abdominal pain, bloating, and altered bowel habits, and affecting an estimated 9%-12% of global population[1-3]. The substantial symptoms burden and its impact on quality of life make effective management of IBS essential; however, its diagnosis is complicated by the absence of a universal biomarker and the complex, multifactorial pathophysiology underpinned the disorder[4]. The diagnosis of IBS remains challenging due to the absence of universal biomarkers and the reliance on symptom-based frameworks, such as the Rome IV criteria, which are inherently subjective[5]. The current diagnostic standard relies on the Rome IV criteria, which uses symptom-based frameworks but remains inherently subjective, leading to frequent misdiagnoses and delayed treatment initiation.

The lack of universal biomarkers is particularly problematic when distinguishing IBS from other gastrointestinal disorders, as overlapping symptoms frequently result in misdiagnosis and impose significant socioeconomic burdens on healthcare systems[6]. For example, IBS subtypes such as diarrhea-predominant IBS and constipation-predominant IBS present unique diagnostic challenges, further complicating the clinical approach to this disorder. Novel approaches, including artificial intelligence (AI), are emerging to address these diagnostic challenges by leveraging complex datasets from diverse biomarkers.

AI technologies have demonstrated significant potential to advance diagnostics across various medical fields, including radiology, pathology, and genomics[7]. By training models on large and complex datasets, AI can identify subtle patterns and relationships that may not be evident through traditional approaches. For IBS, AI models have been applied to diverse biomarkers such as fecal microbiome composition, gas chromatography, neuroimaging, and protease activity. For instance, AI has shown utility in analyzing fecal microbiome profiles to distinguish IBS subtypes with high accuracy, with some models achieving diagnostic accuracies as high as 98%[8-10]. Tools such as phonoenterography, which apply AI to bowel sound analysis, are also being explored as non-invasive diagnostic methods[11]. Additionally, AI-based tools that integrate food preferences and brain activity have been developed to assist in diagnosing functional gut disorders, including IBS[6]. Despite these promising advances, the implementation of AI in IBS diagnosis faces challenges. Variability in the types of biomarkers used, model training processes, and evaluation metrics complicates the ability to draw definitive conclusions. Recent reviews emphasize the critical need for standardized protocols, as inconsistent methodologies hinder reliable comparisons across studies and limit reproducibility[12]. Furthermore, limited population diversity in training datasets raises concerns about the generalizability of findings, particularly for underserved groups[13].

This novel systematic review aims to evaluate the diagnostic accuracy of AI models applied to IBS, assess the quality and robustness of methodologies used, and identify areas for further research and standardization. By summarizing current findings, this review offers valuable insights for future AI application in gastroenterology and contributes to growing body of evidence supporting more objective diagnostic approach in IBS.

MATERIALS AND METHODS
Study design and protocol

This systematic review was conducted in accordance with the updated PRISMA guidelines[14], as illustrated in Figure 1. Observational studies evaluating the diagnostic accuracy of AI and machine learning models in diagnosing IBS were included. The key metrics of interest were sensitivity, specificity, accuracy, and area under the curve (AUC).

Figure 1
Figure 1  PRISMA flow chart of the included studies.
Literature search strategy

A comprehensive literature search across six databases: MEDLINE, Embase, Cochrane Central Register of Controlled Trials, Scopus, Web of Science, and Google Scholar. The search spanned from database inception to May 31, 2024, using combinations of keywords and medical subject headings related to “irritable bowel syndrome”, “IBS”, “artificial intelligence”, “machine learning”, “diagnosis”, and “diagnostic accuracy”. References of published reviews and the included studies were hand-searched for precision. Detailed search terms and strategy are provided in the Supplementary Table 1.

To streamline the study selection process, we utilized Covidence, an online systematic review management tool. Covidence facilitated the deduplication of 18053 identified records and automated the screening of titles and abstracts. A total of 17807 records were excluded based on predefined criteria, including duplicates, non-relevant study types (e.g., conference abstracts, case series), and studies unrelated to IBS diagnostics. Two independent reviewers used Covidence to further screen the remaining studies for eligibility. Full-text articles were reviewed, and discrepancies in inclusion decisions were resolved by a third independent reviewer.

Inclusion and exclusion criteria

Studies were included if they met the following criteria: (1) Study type: Observational studies evaluating the diagnostic accuracy of AI for IBS; (2) Population: Adult participants (≥ 18 years) with suspected or confirmed IBS; and (3) Outcomes: Studies reporting at least one diagnostic accuracy metric (sensitivity, specificity, accuracy, or AUC). The following exclusion criteria were applied: (1) Studies focused on pediatric populations or non-human subjects; (2) Reviews, editorials, case series, or conference abstracts; and (3) Studies without sufficient diagnostic data or that did not evaluate AI models.

Data extraction

Two reviewers independently extracted data, including study characteristics (author, year, country), participant demographics, sample size, biomarkers evaluated, AI model types, and diagnostic performance metrics (e.g., sensitivity, specificity, accuracy, AUC). Additional data on AI model training, validation methods, and test types were collected to aid in quality assessment. Biomarkers were categorized into four primary groups: (1) Fecal microbiome composition; (2) Gas chromatography data; (3) Neuroimaging features; and (4) Protease activity. Discrepancies during data extraction were resolved through consensus to minimize potential bias.

Quality assessment

The methodological quality and risk of bias in included studies were evaluated using the following tools: (1) Quality assessment of diagnostic accuracy studies-2 (QUADAS-2): Assessed domains such as patient selection, index test, reference standard, and diagnostic timing[15]; (2) Prediction model risk of bias assessment tool-AI (PROBAST-AI): Evaluated the validity of predictors, clarity of outcomes, and analytical robustness[16]; and (3) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis-AI (TRIPOD-AI): Focused on data processing, model specification, and transparency in reporting training and validation protocols[16].

Variability in the included studies’ quality was noted, with differences in sample sizes, biomarker selection, and validation methods. Studies employing larger datasets and external validation generally achieved higher AUCs and accuracy, underscoring the importance of robust methodologies for AI-based diagnostic performance. Each criterion was assigned a score of 1 if met (indicating low risk of bias or high methodological rigor) and 0 if unmet. The total score indicated the level of methodological rigor across studies.

Data synthesis and analysis

Given the heterogeneity across study designs, biomarkers, and AI models, a meta-analysis was not conducted. Instead, a narrative synthesis is provided, grouping findings by AI model type [e.g., neural networks, support vector machines (SVMs)] and biomarker category (e.g., gas chromatography, fecal microbiome, neuroimaging, protease activity). Diagnostic metrics are summarized for comparison across studies.

RESULTS
Study selection and characteristics

A total of 18053 studies were identified through database searches. After removing duplicates and screening titles, abstracts, and full texts, 56 studies remained eligible for detailed assessment, of which six met the inclusion criteria. These six studies, published between 2014 and 2023, were case-control in design and involved a total of 1366 participants from the United Kingdom, China, and Japan[8,9,17-20]. The included studies evaluated the diagnostic accuracy of AI models applied to biomarkers such as fecal microbiome analysis[8,9], gas chromatography[17,18], neuroimaging[19] and protease activity[20]. AI algorithms included artificial neural network[17], SVM[18,19], random forest models[8,20], k-nearest neighbors and SVM multi-layer perceptron[9]. More details in Tables 1 and 2.

Table 1 Characteristics of the included studies.
Ref.
Journal
Country
Sample size
Sex (male:female)
IBS
Control
Biomarker
AI used
Specificity
Sensitivity
Accuracy
AUC
Shepherd et al[17], 2014J Breath ResUnited Kingdom803446Gas chromatographyArtificial neural network analyses0.54
Aggio et al[18], 2017Aliment Pharmacol TherUnited Kingdom6926:432841Gas chromatographyML (SVM)0.91
Mao et al[19], 2020Hum Brain MappChina6834:343434Neuroimaging, HFML (SVM)HF: 0.671, 0.806, 0.529; all: 0.750HF: 0.626, 0.438, 0.668; all: 0.679HF: 0.649, 0.619, 0.599; all: 0.715HF: 0.708, 0.659, 0.61; all: 0.776
Fukui et al[8], 2020J Clin MedJapan11146:658526Gut microbiomeML (random forest model)> 0.90> 0.800.846
Su et al[9], 2022Nat CommunChina1038145893Gut microbiomeML (random forests, k-nearest neighbors, SVM multi-layer perceptron and SVM)0.980.940.980.99
Tanaka et al[20], 2023Front MicrobiolJapan7070:03535Protease activity, C-terminal residue of K, R, S, or G, all probes, microbiome, and metabolomeML (random forest model)Protease activity: 0.727, and all probes: 0.909Protease activity: 0.81, and all probes: 0.905Protease activity: 0.83, all probes: 0.92, microbiome: 0.58, metabolome: 0.67
Table 2 Quality assessment of the included studies.
Ref.
Diagnostic accuracy
Machine learning
Patient selection1
Index test1
Reference standard1
Flow & timing1
Predictors2
Outcomes2
Analysis2
Data processing3
Model specification3
Training/validation3
Performance metrics3
Transparency3
Shepherd et al[17]0 (clinic-based sample)1 (ANN model)1 (Rome II)0 (no external validation)1 (ANN)1 (Rome II)1 (cross-validation used)1 (time binning and normalization)1 (ANN model with hidden layers)1 (4-fold cross-validation)1 (sensitivity, specificity calculated)0 (limited code transparency)
Aggio et al[18]1 (diverse control)1 (SVM and PLS pipeline)1 (CRP and WCC levels)1 (partial external validation)1 (SVM)1 (definition)1 (multiple CV methods for robustness)1 (normalized gas values)1 (SVM with PLS setup)1 (Monte Carlo and 10-fold cross-validation)1 (ROC, sensitivity)0 (no full code access)
Mao et al[19]0 (specific IBS subtypes)1 (multi-class SVM based on ROIs)1 (Rome IV)0 (no external validation)1 (SVM)1 (Rome IV)0 (limited test sets)1 (SPM preprocessing for ROIs)1 (SVM for IBS classification)1 (10-fold cross-validation)1 (AUC, sensitivity, specificity)0 (limited data sharing)
Fukui et al[8]1 (multicenter approach)1 (RF and KNN models)1 (Rome IV and histological standards)0 (no external validation)1 (adjusted predictors)1 (Rome IV and histological standards)0 (no external testing)1 (batch effect adjustment)1 (RF and KNN classifiers)1 (nested CV)1 (AUC and AUPR)1 (full settings provided, partial sharing)
Su et al[9]1 (matched control)1 (RF model)1 (standard enzyme-linked diagnosis)1 (robust cross-validation)1 (enzyme activity focus)1 (enzyme-based diagnosis)1 (5-fold cross-validation)1 (normalization for enzyme analysis)1 (RF model with grid search)1 (5-fold cross-validation)1 (comprehensive ROC analysis)1 (standard software in R)
Tanaka et al[20]1 (broad sample selection)1 (RF validated with Bray-Curtis)1 (Rome IV and microbial standards)1 (rigorous cross-validation1 (RF)1 (Rome IV for microbial analysis)1 (external validation)1 (Bray-Curtis dissimilarity for microbiome)1 (RF validated externally)1 (nested CV with external testing)1 (AUROC and AUPR)1 (code and dataset on GitHub)
Diagnostic accuracy of AI models

There are four AI models: (1) Gas chromatography: Shepherd et al[17] reported an accuracy of 54% for diagnosing IBS using artificial neural networks, indicating limited clinical utility. While Aggio et al[18] utilizing SVMs, achieved a higher accuracy of 91%, demonstrating the potential for improved performance with advanced algorithms; (2) Fecal microbiome analysis: Fukui et al[8] reported sensitivity and specificity exceeding 80% and 90%, respectively, using random forest models, with an AUC of 0.846. Su et al[9] achieved the highest diagnostic performance, with accuracy and specificity reaching 98% and an AUC of 0.99 using multiple AI models, including SVMs and random forests; (3) Neuroimaging features: Mao et al[19] evaluated the habenula’s connectivity using SVMs and reported moderate diagnostic performance. Specificity, sensitivity, and AUC ranged from 0.61 to 0.776 depending on the dataset used, reflecting variability in neuroimaging-based biomarkers; and (4) Protease activity: Tanaka et al[20] employed random forest models to analyze protease activity and associated biomarkers. The diagnostic accuracy for protease activity alone was 83%, while the inclusion of all probes increased accuracy to 92%, with an AUC of 0.92.

Heterogeneity and generalizability

The studies displayed significant heterogeneity in sample size, biomarker types, and validation methods. Larger datasets, such as that of Su et al[9] showed superior diagnostic performance, suggesting that robust datasets improve model reliability. Conversely, smaller studies with limited validation, such as Shepherd et al[17] demonstrated reduced diagnostic accuracy. Population diversity was limited, with most studies conducted in Asia and Europe, raising concerns about the generalizability of findings to broader populations. Additionally, variability in biomarker selection (e.g., microbiome vs neuroimaging) contributed to the range of reported diagnostic accuracies, highlighting the need for standardized protocols.

Quality assessment

The methodological quality of included studies was evaluated using QUADAS-2, PROBAST-AI, and TRIPOD-AI tools, allowing a thorough examination of study design, predictor validity, outcome clarity, and transparency. Each tool addressed different aspects of quality, highlighting both the strengths and limitations across the included studies. QUADAS-2 assessed patient selection, index test, reference standard, and consistency in flow and timing. Four studies demonstrated low risk of bias in patient selection, enrolling representative populations to enhance generalizability[8,9,18,20]. However, two studies exhibited moderate bias due to restrictive inclusion criteria or unrepresentative sampling[17,19]. Most studies met the index test criteria by providing clear descriptions of AI model training and validation, though one study lacked sufficient tuning parameter details, affecting reproducibility[17]. All studies applied consistent reference standards, using established criteria like Rome IV. Flow and timing between the index test and reference standard were consistent across studies.

PROBAST-AI focused on predictor validity, outcome clarity, and analysis robustness. Four studies achieved high-quality predictor validation, with careful data preprocessing that ensured the relevance and consistency of biomarkers used in AI models[8,9,18,20]. In contrast, two studies faced moderate risk due to limited validation, particularly concerning the reproducibility of biomarker analyses across different populations[17,19]. Outcome clarity was consistently high across all studies, with standardized criteria for IBS diagnosis enhancing clinical relevance. Analytical robustness varied; while four studies performed sensitivity analyses to confirm model stability, two provided limited cross-validation details, impacting reliability[17,19].

TRIPOD-AI evaluated data processing, model specification, and validation protocols, focusing on transparency and reproducibility. Four studies provided thorough documentation of data handling, from collection through preprocessing, ensuring transparency[8,9,18,20]. One study, however, had moderate bias due to insufficient data cleaning information[17]. Model specifications, including parameter tuning and performance metrics, were well-documented across most studies, with three providing model code or datasets for reproducibility[8,9,20]. Two studies partially described model architecture, creating moderate concerns regarding reproducibility[17,19]. Validation practices were generally robust, with four studies employing external datasets or cross-validation, supporting broader applicability[8,9,18,20]. Two studies, however, lacked external validation, limiting generalizability[17,19].

DISCUSSION

This systematic review highlights the promise of AI in improving the diagnostic accuracy of IBS, an area of clinical need due to the limitations of existing symptom-based diagnostic criteria such as Rome IV. Among the reviewed studies, diagnostic accuracies ranged significantly across different biomarkers, with models for fecal microbiome analysis achieving up to 98% accuracy. This finding is consistent with studies like those by Su et al[9] which demonstrated the superior diagnostic performance of multi-modal AI models using robust datasets, with an AUC of 0.99[21,22]. It is possible that the biological relevance of gut microbiome, relatively large sample size and robust study quality have resulted in high AUC compared to other studies. This suggests that AI can potentially overcome the diagnostic subjectivity of traditional approaches, especially in capturing the complex interactions of the gut microbiome implicated in IBS pathophysiology[21,22].

Notably, Xie et al[23] demonstrated the use of SVMs with whole-brain functional connectivity features from resting-state functional magnetic resonance imaging to differentiate IBS patients from healthy controls. This supports the potential of AI models to address the limitations of Rome IV criteria by incorporating objective neuroimaging biomarkers. Similarly, Katsumata et al’s integration of brain activity and food preferences into an artificial neural network-based diagnostic system highlights the role of multi-modal AI approaches in capturing gut-brain axis interactions[6]. Such approaches highlight AI’s capacity to incorporate diverse biomarkers, allowing for a tailored and nuanced approach to IBS diagnostics.

The diagnostic models included in this review employed various AI algorithms, SVM, random forests, and neural networks, across different biomarkers. Models that used fecal microbiome analysis consistently showed high diagnostic performance, aligning with growing evidence of gut dysbiosis as a distinguishing feature of IBS. However, as noted in Tabata et al[24] variability in biomarkers such as gas chromatography and neuroimaging highlights the need for more robust validation techniques to improve diagnostic consistency. A recent study further supported this view by using SVM models based on resting-state functional connectivity to differentiate IBS patients from healthy controls, suggesting that neuroimaging biomarkers may play a complementary role in IBS diagnosis, particularly for identifying gut-brain axis disturbances[23]. Conversely, models based on gas chromatography and neuroimaging data in earlier studies showed more variability in accuracy, possibly due to methodological inconsistencies or the complex, multi-factorial nature of IBS.

Historically, attempts to identify biomarkers such as anti-cytolethal distending toxin subunit B and anti-vinculin antibodies offered insights for diagnosing diarrhea-predominant IBS[21]. However, later studies questioned the effectiveness of these markers across IBS subtypes, particularly for constipation-predominant IBS[25]. These findings emphasize the need for AI models that can integrate multi-modal biomarkers to enhance diagnostic precision across all IBS subtypes. Studies such as those by Ruffle et al[26] have further emphasized the importance of population diversity and deep phenotyping to improve the generalizability of AI models for functional bowel disorders. The importance of gut-brain interactions and neuroimaging biomarkers in AI applications suggests a path forward for more specific and subtype-sensitive diagnostic tools.

Despite promising diagnostic accuracy, several challenges remain for AI’s integration into routine IBS diagnostics. The methodological heterogeneity observed across studies, such as varying sample sizes, biomarker selections, and model training techniques, limits direct comparisons and hampers the development of standardized AI-based diagnostic tools. For example, Tabata et al[24] study highlighted the potential of AI in endoscopic imaging for detecting subtle colonic changes associated with IBS, which are often missed by human observers. However, this approach requires standardization to ensure reproducibility across clinical settings.

Another significant limitation is the moderate quality of evidence in some studies, as assessed by QUADAS-2, PROBAST-AI, and TRIPOD-AI tools in this review. Certain studies exhibited limited validation techniques or unrepresentative samples, which could introduce bias and reduce the reliability of AI applications in IBS diagnostics. The study by Kordi et al[12] specifically noted the socioeconomic burden of misdiagnosis in IBS, further highlighting the importance of rigorous validation to ensure AI tools provide reliable diagnostic insights. The lack of standardization in reporting methodologies, as highlighted in Xie et al[23] further underscores the need for consistent protocols and larger multicenter trials. Addressing these limitations will require larger, multicenter trials that include diverse patient populations and establish consistent reporting standards. While the diagnostic accuracy of AI models is promising, their integration into routine clinical practice requires addressing practical challenges, including the need for infrastructure to process high-dimensional data (e.g., microbiome, neuroimaging), cost of implementation, and clinician training. Future deployment should prioritize user-friendly, interoperable platforms that align with existing diagnostic workflows to facilitate adoption in real-world gastroenterology settings.

The integration of AI into IBS diagnostics offers several benefits for clinical practice. These models could provide non-invasive, objective diagnostic options that potentially reduce reliance on extensive testing, aligning with a shift toward value-based care. Moreover, AI could significantly reduce healthcare costs associated with IBS, which have been estimated at over one billion dollars annually in the United States due to misdiagnosis and repeated evaluations[27]. This potential for economic savings is critical as healthcare systems increasingly focus on cost-effective approaches to managing chronic conditions. Future research should prioritize multi-biomarker integration, rigorous external validation, and a focus on underserved populations to further enhance the clinical utility of AI models[10,26].

Strengths and limitations

Our study offers several strengths, particularly its clinical relevance and status as the first systematic review of AI applications in diagnosing IBS, by systematically evaluating the diagnostic accuracy of AI models across various biomarkers, this review highlights the potential of these technologies to enhance diagnostic precision and develop algorithms to direct treatment strategies for a complex and multifactorial condition. However, the review also has limitations. First, while AI-related research has grown significantly since 2021, this review only includes two studies from 2021 to May 2024. This discrepancy may be attributed to the inclusion criteria, database coverage, or search strategy limitations, potentially overlooking some recent advancements in the field. Second, we excluded studies if they were written in non-English. Third, the overall quality of included studies was moderate, raising concerns about the robustness of findings. Methodological heterogeneity across studies, such as variations in sample sizes, AI models, and diagnostic criteria, limited the ability to conduct a comprehensive meta-analysis. Fourth, the lack of external validation limited the generalizability of the results, highlighting the need for multi-center and multi-national studies. Finally, the lack of standardization in AI implementation and reporting poses challenges for translating these findings into clinical practice. These limitations highlight the need for high-quality, multicenter research that includes diverse populations, applies consistent methodologies, and adheres to standardized protocols. Such efforts are essential for supporting the reliable integration of AI in IBS diagnostics and advancing its clinical utility.

CONCLUSION

This systematic review highlights the transformative potential of AI models in improving diagnostic accuracy for IBS, addressing the limitations of traditional symptom-based criteria like Rome IV. By leveraging complex biological data such as fecal microbiome composition, neuroimaging features, and protease activity, AI models can provide a deeper understanding of IBS pathophysiology and achieve diagnostic accuracies of up to 98%. Despite their promise, methodological heterogeneity, moderate study quality, and a lack of standardization across AI applications hinder their integration into clinical practice. Future research should focus on multi-center trials, large and diverse populations, and standardized protocols, paving the way for more objective, reliable, and personalized diagnostic approaches.

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Gastroenterology and hepatology

Country of origin: United States

Peer-review report’s classification

Scientific Quality: Grade B, Grade B

Novelty: Grade B, Grade B

Creativity or Innovation: Grade B, Grade B

Scientific Significance: Grade B, Grade B

P-Reviewer: Oviedo RJ; Torun M S-Editor: Wu S L-Editor: A P-Editor: Zhao S

References
1.  Longstreth GF, Thompson WG, Chey WD, Houghton LA, Mearin F, Spiller RC. Functional bowel disorders. Gastroenterology. 2006;130:1480-1491.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 3413]  [Cited by in RCA: 3375]  [Article Influence: 177.6]  [Reference Citation Analysis (1)]
2.  Oka P, Parr H, Barberio B, Black CJ, Savarino EV, Ford AC. Global prevalence of irritable bowel syndrome according to Rome III or IV criteria: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol. 2020;5:908-917.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 155]  [Cited by in RCA: 460]  [Article Influence: 92.0]  [Reference Citation Analysis (0)]
3.  Lacy BE, Patel NK. Rome Criteria and a Diagnostic Approach to Irritable Bowel Syndrome. J Clin Med. 2017;6:99.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 174]  [Cited by in RCA: 314]  [Article Influence: 39.3]  [Reference Citation Analysis (0)]
4.  Sood R, Law GR, Ford AC. Diagnosis of IBS: symptoms, symptom-based criteria, biomarkers or 'psychomarkers'? Nat Rev Gastroenterol Hepatol. 2014;11:683-691.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 40]  [Cited by in RCA: 47]  [Article Influence: 4.3]  [Reference Citation Analysis (0)]
5.  Mearin F, Lacy BE, Chang L, Chey WD, Lembo AJ, Simren M, Spiller R. Bowel Disorders. Gastroenterology. 2016;S0016-5085(16)00222.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 1781]  [Cited by in RCA: 1868]  [Article Influence: 207.6]  [Reference Citation Analysis (3)]
6.  Katsumata R, Hosokawa T, Kamada T. Artificial Intelligence-Based Diagnostic Support System for Functional Dyspepsia Based on Brain Activity and Food Preference. Cureus. 2023;15:e49877.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
7.  Kaul V, Enslin S, Gross SA. History of artificial intelligence in medicine. Gastrointest Endosc. 2020;92:807-812.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 125]  [Cited by in RCA: 301]  [Article Influence: 60.2]  [Reference Citation Analysis (1)]
8.  Fukui H, Nishida A, Matsuda S, Kira F, Watanabe S, Kuriyama M, Kawakami K, Aikawa Y, Oda N, Arai K, Matsunaga A, Nonaka M, Nakai K, Shinmura W, Matsumoto M, Morishita S, Takeda AK, Miwa H. Usefulness of Machine Learning-Based Gut Microbiome Analysis for Identifying Patients with Irritable Bowels Syndrome. J Clin Med. 2020;9:2403.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 33]  [Cited by in RCA: 37]  [Article Influence: 7.4]  [Reference Citation Analysis (0)]
9.  Su Q, Liu Q, Lau RI, Zhang J, Xu Z, Yeoh YK, Leung TWH, Tang W, Zhang L, Liang JQY, Yau YK, Zheng J, Liu C, Zhang M, Cheung CP, Ching JYL, Tun HM, Yu J, Chan FKL, Ng SC. Faecal microbiome-based machine learning for multi-class disease diagnosis. Nat Commun. 2022;13:6818.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 67]  [Cited by in RCA: 57]  [Article Influence: 19.0]  [Reference Citation Analysis (0)]
10.  Vulpoi RA, Luca M, Ciobanu A, Olteanu A, Bărboi O, Iov DE, Nichita L, Ciortescu I, Cijevschi Prelipcean C, Ștefănescu G, Mihai C, Drug VL. The Potential Use of Artificial Intelligence in Irritable Bowel Syndrome Management. Diagnostics (Basel). 2023;13:3336.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 5]  [Reference Citation Analysis (0)]
11.  Redij R, Kaur A, Muddaloor P, Sethi AK, Aedma K, Rajagopal A, Gopalakrishnan K, Yadav A, Damani DN, Chedid VG, Wang XJ, Aakre CA, Ryu AJ, Arunachalam SP. Practicing Digital Gastroenterology through Phonoenterography Leveraging Artificial Intelligence: Future Perspectives Using Microwave Systems. Sensors (Basel). 2023;23:2302.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 2]  [Cited by in RCA: 7]  [Article Influence: 3.5]  [Reference Citation Analysis (0)]
12.  Kordi M, Dehghan MJ, Shayesteh AA, Azizi A. The impact of artificial intelligence algorithms on management of patients with irritable bowel syndrome: A systematic review. Inform Med Unlocked. 2022;29:100891.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 7]  [Reference Citation Analysis (0)]
13.  Abdulrazak B, Mostafa Ahmed H, Aloulou H, Mokhtari M, Blanchet FG. IoT in medical diagnosis: detecting excretory functional disorders for Older adults via bathroom activity change using unobtrusive IoT technology. Front Public Health. 2023;11:1161943.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 2]  [Reference Citation Analysis (0)]
14.  Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, Chou R, Glanville J, Grimshaw JM, Hróbjartsson A, Lalu MM, Li T, Loder EW, Mayo-Wilson E, McDonald S, McGuinness LA, Stewart LA, Thomas J, Tricco AC, Welch VA, Whiting P, Moher D. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021;10:89.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 2603]  [Cited by in RCA: 4200]  [Article Influence: 1050.0]  [Reference Citation Analysis (33)]
15.  Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MM, Sterne JA, Bossuyt PM; QUADAS-2 Group. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155:529-536.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 6953]  [Cited by in RCA: 9406]  [Article Influence: 671.9]  [Reference Citation Analysis (0)]
16.  Collins GS, Dhiman P, Andaur Navarro CL, Ma J, Hooft L, Reitsma JB, Logullo P, Beam AL, Peng L, Van Calster B, van Smeden M, Riley RD, Moons KG. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11:e048008.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 286]  [Cited by in RCA: 411]  [Article Influence: 102.8]  [Reference Citation Analysis (0)]
17.  Shepherd SF, McGuire ND, de Lacy Costello BP, Ewen RJ, Jayasena DH, Vaughan K, Ahmed I, Probert CS, Ratcliffe NM. The use of a gas chromatograph coupled to a metal oxide sensor for rapid assessment of stool samples from irritable bowel syndrome and inflammatory bowel disease patients. J Breath Res. 2014;8:026001.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 34]  [Cited by in RCA: 32]  [Article Influence: 2.9]  [Reference Citation Analysis (0)]
18.  Aggio RB, White P, Jayasena H, de Lacy Costello B, Ratcliffe NM, Probert CS. Irritable bowel syndrome and active inflammatory bowel disease diagnosed by faecal gas analysis. Aliment Pharmacol Ther. 2017;45:82-90.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 14]  [Cited by in RCA: 17]  [Article Influence: 2.1]  [Reference Citation Analysis (0)]
19.  Mao CP, Chen FR, Huo JH, Zhang L, Zhang GR, Zhang B, Zhou XQ. Altered resting-state functional connectivity and effective connectivity of the habenula in irritable bowel syndrome: A cross-sectional and machine learning study. Hum Brain Mapp. 2020;41:3655-3666.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 5]  [Cited by in RCA: 22]  [Article Influence: 4.4]  [Reference Citation Analysis (0)]
20.  Tanaka K, Tanigawa N, Song I, Komatsu T, Kuriki Y, Tanaka Y, Fukudo S, Urano Y, Fukuda S. A protease activity-based machine-learning approach as a complementary tool for conventional diagnosis of diarrhea-predominant irritable bowel syndrome. Front Microbiol. 2023;14:1179534.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
21.  Pimentel M, Morales W, Rezaie A, Marsh E, Lembo A, Mirocha J, Leffler DA, Marsh Z, Weitsman S, Chua KS, Barlow GM, Bortey E, Forbes W, Yu A, Chang C. Development and validation of a biomarker for diarrhea-predominant irritable bowel syndrome in human subjects. PLoS One. 2015;10:e0126438.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in Crossref: 92]  [Cited by in RCA: 101]  [Article Influence: 10.1]  [Reference Citation Analysis (0)]
22.  Pittayanon R, Lau JT, Yuan Y, Leontiadis GI, Tse F, Surette M, Moayyedi P. Gut Microbiota in Patients With Irritable Bowel Syndrome-A Systematic Review. Gastroenterology. 2019;157:97-108.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 303]  [Cited by in RCA: 450]  [Article Influence: 75.0]  [Reference Citation Analysis (0)]
23.  Xie L, Zhuang Z, Lin X, Shi X, Zheng Y, Wu K, Ma S. Support vector machine classification of irritable bowel syndrome patients based on whole-brain resting-state functional connectivity features. Quant Imaging Med Surg. 2024;14:7279-7290.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (0)]
24.  Tabata K, Mihara H, Nanjo S, Motoo I, Ando T, Teramoto A, Fujinami H, Yasuda I. Artificial intelligence model for analyzing colonic endoscopy images to detect changes associated with irritable bowel syndrome. PLOS Digit Health. 2023;2:e0000058.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in RCA: 7]  [Reference Citation Analysis (0)]
25.  Rezaie A, Park SC, Morales W, Marsh E, Lembo A, Kim JH, Weitsman S, Chua KS, Barlow GM, Pimentel M. Assessment of Anti-vinculin and Anti-cytolethal Distending Toxin B Antibodies in Subtypes of Irritable Bowel Syndrome. Dig Dis Sci. 2017;62:1480-1485.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 33]  [Cited by in RCA: 37]  [Article Influence: 4.6]  [Reference Citation Analysis (0)]
26.  Ruffle J, Henderson M, Ng CE, Liddle T, Nelson A, Nachev P, Knowles C, Yiannakou Y. O62 The lived experience of functional bowel disorders: a machine learning approach. Gut. 2024;73:A38-A39.  [PubMed]  [DOI]  [Full Text]
27.  Everhart JE, Ruhl CE. Burden of digestive diseases in the United States part I: overall and upper gastrointestinal diseases. Gastroenterology. 2009;136:376-386.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Cited by in Crossref: 386]  [Cited by in RCA: 414]  [Article Influence: 25.9]  [Reference Citation Analysis (0)]