Clinical and Translational Research
Copyright ©The Author(s) 2022. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Dec 14, 2022; 28(46): 6551-6563
Published online Dec 14, 2022. doi: 10.3748/wjg.v28.i46.6551
Hybrid XGBoost model with hyperparameter tuning for prediction of liver disease with better accuracy
Surjeet Dalal, Edeh Michael Onyema, Amit Malik
Surjeet Dalal, Department of CSE, Amity University, Gurugram 122413, Haryana, India
Edeh Michael Onyema, Department of Mathematics and Computer Science, Coal City University, Enugu 400102, Nigeria
Amit Malik, Department of CSE, SRM University, Delhi-NCR, Sonipat 131001, Haryana, India
Author contributions: Onyema EM contributed to the introduction, background, results, and analysis; Dalal S contributed to the design, methods, conclusion, and background; Malik A contributed to the discussion, data collection, and review of the final draft.
Institutional review board statement: There was no ethical approval required.
Clinical trial registration statement: This letter is to confirm that the results are being generated on open access data for this study and does not involve any clinical trial.
Informed consent statement: The patients were not required to obtain informed consent for this study as the dataset is available on the open access Kaggle website.
Conflict-of-interest statement: All the authors report having no relevant conflicts of interest for this article.
Data sharing statement: The supporting data may be provided by the corresponding author upon reasonable request.
CONSORT 2010 statement: The authors have read the CONSORT 2010 Statement, and the manuscript was prepared and revised according to the CONSORT 2010 Statement.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Edeh Michael Onyema, Lecturer, Head of Department, Mathematics and Computer Science, Coal City University, Coal City University Emene, Enugu 400102, Nigeria. michael.edeh@ccu.edu.ng
Received: June 30, 2022
Peer-review started: June 30, 2022
First decision: July 13, 2022
Revised: July 27, 2022
Accepted: November 21, 2022
Article in press: November 21, 2022
Published online: December 14, 2022
Abstract
BACKGROUND

Liver disease indicates any pathology that can harm or destroy the liver or prevent it from normal functioning. The global community has recently witnessed an increase in the mortality rate due to liver disease. This could be attributed to many factors, among which are human habits, awareness issues, poor healthcare, and late detection. To curb the growing threats from liver disease, early detection is critical to help reduce the risks and improve treatment outcome. Emerging technologies such as machine learning, as shown in this study, could be deployed to assist in enhancing its prediction and treatment.

AIM

To present a more efficient system for timely prediction of liver disease using a hybrid eXtreme Gradient Boosting model with hyperparameter tuning with a view to assist in early detection, diagnosis, and reduction of risks and mortality associated with the disease.

METHODS

The dataset used in this study consisted of 416 people with liver problems and 167 with no such history. The data were collected from the state of Andhra Pradesh, India, through https://www.kaggle.com/datasets/uciml/indian-liver-patient-records. The population was divided into two sets depending on the disease state of the patient. This binary information was recorded in the attribute "is_patient".

RESULTS

The results indicated that the chi-square automated interaction detection and classification and regression trees models achieved an accuracy level of 71.36% and 73.24%, respectively, which was much better than the conventional method. The proposed solution would assist patients and physicians in tackling the problem of liver disease and ensuring that cases are detected early to prevent it from developing into cirrhosis (scarring) and to enhance the survival of patients. The study showed the potential of machine learning in health care, especially as it concerns disease prediction and monitoring.

CONCLUSION

This study contributed to the knowledge of machine learning application to health and to the efforts toward combating the problem of liver disease. However, relevant authorities have to invest more into machine learning research and other health technologies to maximize their potential.

Keywords: Liver infection, Machine learning, Chi-square automated interaction detection, Classification and regression trees, Decision tree, XGBoost, Hyperparameter tuning

Core Tip: This article proposed the hybrid eXtreme Gradient Boosting model for prediction of liver disease. This model was designed by optimizing the hyperparameter tuning with the help of Bayesian optimization. The classification and regression trees and chi-square automated interaction detection models on their own are not accurate in predicting liver disease among Indian patients. The proposed model utilized different physical health status, i.e. level of bilirubin, direct bilirubin, alkaline phosphatase, alanine aminotransferase, aspartate aminotransferase, total proteins, albumin, and globulin in prediction of the liver disease. This work was aimed at designing a more accurate machine learning model in liver disease prediction.