1
|
Badr S, Tahri M, Maanan M, Kašpar J, Yousfi N. An intelligent decision-making system for embryo transfer in reproductive technology: a machine learning-based approach. Syst Biol Reprod Med 2025; 71:13-28. [PMID: 39873464 DOI: 10.1080/19396368.2024.2445831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 11/04/2024] [Accepted: 12/15/2024] [Indexed: 01/30/2025]
Abstract
Infertility has emerged as a significant public health concern, with assisted reproductive technology (ART) is a last-resort treatment option. However, ART's efficacy is limited by significant financial cost and physical discomfort. The aim of this study is to build Machine learning (ML) decision-support models to predict the optimal range of embryo numbers to transfer, using data from infertile couples identified through literature reviews. Binary classification models were developed to classify cases into two groups: those transferring two or fewer embryos and those transferring three or four. Four popular ML algorithms were used, including random forest (RF), logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN), considering seven criteria: the woman's age, sperm origin, the developmental qualities of four potential embryos, infertility duration, assessment of the woman, morphological qualities of the four best embryos on the day of transfer, and number of oocytes extracted. The stratified 3-fold cross-validation results show that the SVM model obtained the highest average accuracy (95.83%) and demonstrated the best overall performance, closely followed by the ANN and LR models with an average accuracy equal to 91.67%. The RF model achieved a slightly lower average accuracy (88.89%), which demonstrated the lowest variability. Testing on a new dataset revealed all models performed well, with ANN and SVM models classified all test set instances correctly, while the RF and LR models achieved 91.68% accuracy. These results highlight the superior generalization and effectiveness of the ANN and SVM models in guiding ART decisions.
Collapse
Affiliation(s)
- Sanaa Badr
- Department of Mathematics and Computer Science, Laboratory of Analysis, Modeling and Simulation, Faculty of Sciences Ben M'sik, Hassan II University of Casablanca, Casablanca, Morocco
| | - Meryem Tahri
- Faculty of Forestry and Wood Sciences, Czech University of Life Sciences Prague (CZU), Praha-Suchdol, Czech Republic
| | - Mohamed Maanan
- Laboratory of Littoral, Environment, Remote Sensing and Geomatic (LETG) - UMR6554, Universit´e de Nantes, Nantes, France
| | - Jan Kašpar
- Faculty of Forestry and Wood Sciences, Czech University of Life Sciences Prague (CZU), Praha-Suchdol, Czech Republic
| | - Noura Yousfi
- Department of Mathematics and Computer Science, Laboratory of Analysis, Modeling and Simulation, Faculty of Sciences Ben M'sik, Hassan II University of Casablanca, Casablanca, Morocco
| |
Collapse
|
2
|
Dokare I, Gupta S. Brain-region specific epileptic seizure detection through EEG dynamics: integrating spectral features, SMOTE and long short-term memory networks. Cogn Neurodyn 2025; 19:67. [PMID: 40330716 PMCID: PMC12049356 DOI: 10.1007/s11571-025-10250-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Accepted: 04/01/2025] [Indexed: 05/08/2025] Open
Abstract
Investigating neural dynamics through EEG signals offers valuable insights into brain activity, especially for automated seizure detection. The identification of epileptogenic zones is crucial for effective epilepsy treatment, particularly in surgical planning. This work introduces a novel method for seizure detection using EEG signals, designed to benefit clinicians by integrating spectral features with Long Short-Term Memory (LSTM) networks, enhanced by brain region-specific analysis. This research work captures critical frequency domain characteristics by extracting pivotal spectral features from EEG data, thereby improving the signal representation for LSTM networks. Additionally, this proposed work has employed the Synthetic Minority Over-sampling Technique (SMOTE) to handle the class imbalance problem. Furthermore, a comprehensive spatial analysis of EEG signals is performed to evaluate performance variations across distinct brain regions, enabling targeted region-wise analysis. This strategy effectively reduces the number of channels required, minimizing the need to process all 22 channels specified in the CHB-MIT dataset, thus significantly decreasing computational complexity while preserving high seizure detection performance. This work has obtained a mean value of accuracy of 95.43%, precision of 95.46%, sensitivity of 95.59%, F1-score of 95.48%, and specificity of 95.25% for the brain region providing the best performance for seizure discrimination. The results demonstrate that integrating spectral features and LSTM, augmented by spatial insights, enhances seizure detection performance and hence assists in identifying epileptogenic regions. This tool enhances clinical applications by improving diagnostic precision, personalized treatment strategies, and supporting precise surgical planning for epilepsy, ensuring safer resection and better outcomes.
Collapse
Affiliation(s)
- Indu Dokare
- Department of Electronics Engineering, K. J. Somaiya School of Engineering (Formerly K. J. Somaiya College of Engineering), Somaiya Vidyavihar University, Mumbai, Maharashtra 400077 India
- Department of Computer Engineering, Vivekanand Education Society’s Institute of Technology, Mumbai, Maharashtra 400074 India
| | - Sudha Gupta
- Department of Electronics Engineering, K. J. Somaiya School of Engineering (Formerly K. J. Somaiya College of Engineering), Somaiya Vidyavihar University, Mumbai, Maharashtra 400077 India
| |
Collapse
|
3
|
Li C, Yan Y, Lin W, Zhang Y. Enhancing cancer subtype classification through convolutional neural networks: a deepinsight analysis of TCGA gene expression data. Health Inf Sci Syst 2025; 13:33. [PMID: 40309134 PMCID: PMC12037455 DOI: 10.1007/s13755-025-00349-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Accepted: 03/31/2025] [Indexed: 05/02/2025] Open
Abstract
Purpose This study investigates the adaptation of DeepInsight for cancer subtype classification using high-dimensional gene expression data. Originally designed for non-image data, DeepInsight has been adapted for cancer classification. Methods We evaluated DeepInsight's performance against several models, including support vector machines, LightGBM, neural networks, and decision trees, with and without the application of the Synthetic Minority Oversampling Technique. The study utilized gene expression data from breast, lung, and colon cancers. A novel multi-class feature selection method was introduced, using modified aggregated class activation maps to identify key genes across different cancer subtypes. These critical genes were further analyzed through Gene Ontology to explore their roles in significant biological processes. Result DeepInsight consistently outperformed traditional models in terms of F1 score across breast, lung, and colon cancer datasets, effectively addressing multi-class classification challenges. Notably, several top genes were identified as significant across multiple methods. Furthermore, we conducted a Gene Ontology analysis on the critical genes, including the top genes identified by DeepInsight and the common genes recognized through multiple methods. Conclusion The adaptation of DeepInsight provides an approach to cancer subtype classification by transforming high-dimensional gene expression data into image representations. Utilizing aggregated class activation maps, it effectively identifies critical pixels within these images, enabling the discovery of distinct genes that may not be highlighted by other methods. DeepInsight demonstrates potential as a valuable tool for classifying cancer subtypes and critical genes.
Collapse
Affiliation(s)
- Changda Li
- Mathematics and Statistics, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8 Canada
| | - Yan Yan
- School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, ON N1G 2W1 Canada
| | - Wenjun Lin
- School of Computer Science & Technology, Algoma University, 1520 Queen Street East, Sault Ste. Marie, ON P6A 2G4 Canada
| | - Yue Zhang
- Mathematics and Statistics, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8 Canada
| |
Collapse
|
4
|
Chen H, Liu P, Zhou G, Lu ML, Yu D. Computer vision and tactile glove: A multimodal model in lifting task risk assessment. APPLIED ERGONOMICS 2025; 127:104513. [PMID: 40174433 DOI: 10.1016/j.apergo.2025.104513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 03/20/2025] [Accepted: 03/21/2025] [Indexed: 04/04/2025]
Abstract
Work-related injuries from overexertion, particularly lifting, are a major concern in occupational safety. Traditional assessment tools, such as the Revised NIOSH Lifting Equation (RNLE), require significant training and practice for deployment. This study presents an approach that integrates tactile gloves with computer vision (CV) to enhance the assessment of lifting-related injury risks, addressing the limitations of existing single-modality methods. Thirty-one participants performed 2747 lifting tasks across three lifting risk categories (LI < 1, 1 ≤ LI ≤ 2, LI > 2). Features including hand pressure measured by tactile gloves during each lift and 3D body poses estimated using CV algorithms from video recordings were combined and used to develop prediction models. The Convolutional Neural Network (CNN) model achieved an overall accuracy of 89 % in predicting the three lifting risk categories. The results highlight the potential for a real-time, non-intrusive risk assessment tool to assist ergonomic practitioners in mitigating musculoskeletal injury risks in workplace environments.
Collapse
Affiliation(s)
| | - Peiran Liu
- Purdue University, West Lafayette, IN, USA
| | | | - Ming-Lun Lu
- National Institute for Occupational Safety and Health, Cincinnati, OH, USA
| | - Denny Yu
- Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
5
|
Hassan NH, Chong NS, Yoon TL, Wong YF. Identification and adulteration detection of Heterotrigona itama and Apis dorsata honey using differential scanning calorimetry and convolutional neural networks with data augmentation. Food Chem 2025; 485:144398. [PMID: 40311574 DOI: 10.1016/j.foodchem.2025.144398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 04/13/2025] [Accepted: 04/16/2025] [Indexed: 05/03/2025]
Abstract
This study presents a simple approach for detecting honey adulteration by integrating calorimetric data from differential scanning calorimetry (DSC) with machine learning classification (MLC) techniques, specifically using convolutional neural network (CNN) model alongside the Synthetic Minority Over-sampling TEchnique (SMOTE) for data augmentation. The thermal profiles of different honey varieties, sugar adulterants, and adulterated samples were acquired using DSC. Shifts in glass transition temperatures were observed in adulterated honey. The DSC data were analyzed using principal component analysis and MLC workflow. CNN model applied to original dataset reported accuracy of 24-67 %. However, integrating CNN model with SMOTE algorithm resulted in a significant accuracy improvement to 60-91 %. The integration of DSC with MLC provides a rapid and accurate method for detecting honey adulteration, demonstrating strong generalization capability. The proposed approach could facilitate the development of a framework to detect fraudulent practices, safeguarding honey industry and consumers from sugar-based adulterations.
Collapse
Affiliation(s)
- Norfarizah Hanim Hassan
- Centre for Research on Multidimensional Separation Science, School of Chemical Sciences, Universiti Sains Malaysia,11800 Penang, Malaysia
| | - Ngee Sing Chong
- Department of Chemistry, Middle Tennessee State University, Murfreesboro, TN 37132, United States
| | - Tiem Leong Yoon
- School of Physics, Universiti Sains Malaysia, 11800 Penang, Malaysia.
| | - Yong Foo Wong
- Centre for Research on Multidimensional Separation Science, School of Chemical Sciences, Universiti Sains Malaysia,11800 Penang, Malaysia.
| |
Collapse
|
6
|
Sacco P, Jeong J. Assessing the risk of problem gambling among lottery loyalty program members: A machine learning approach. Addict Behav 2025; 168:108372. [PMID: 40367680 DOI: 10.1016/j.addbeh.2025.108372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 04/09/2025] [Accepted: 04/23/2025] [Indexed: 05/16/2025]
Abstract
BACKGROUND AND AIMS Lottery gambling is a relatively benign form of gambling. Nonetheless, individuals with gambling problems may engage in lottery play and/or play the lottery exclusively. Lottery loyalty programs have data that could be used to screen for problem gambling, as they collect information on demographics and ticket purchases from players who sign up to receive incentives. The current study evaluates the feasibility of machine learning to identify individuals who have gambling problems using data collected from a state lottery loyalty program. METHODS Data from ticket uploads was merged with an online survey sent to loyalty program participants (N = 5903). The Problem Gambling Severity Index (PGSI) was used to screen for problem gambling, with a five or greater denoting problem gambling (n = 809; 14%). Other survey items queried frequency of other gambling (e.g., casino slot machine) as well as amounts spent. Random forests analysis, a predictive modeling technique, was used to predict individuals who have gambling problems. DISCUSSION AND CONCLUSIONS Problem gambling was more common among loyalty program players than typical in population samples. The random forest algorithm performed fairly well overall, but sensitivity was poor, indicating that the model did not identify individuals with problem gambling effectively. Lottery loyalty programs may be a promising setting for screening and secondary prevention efforts because of relatively high prevalence of problem gambling, but random forests may not be the best approach for detecting those at risk.
Collapse
Affiliation(s)
- Paul Sacco
- University of Maryland, 525 West Redwood Street, Baltimore, MD 21201, United States.
| | - Jihyeong Jeong
- University of Maryland, 525 West Redwood Street, Baltimore, MD 21201, United States.
| |
Collapse
|
7
|
Muratov V, Jagiello K, Mikolajczyk A, Danielsen PH, Halappanavar S, Vogel U, Puzyn T. The role of machine learning in predicting titanium dioxide nanoparticles induced pulmonary pathology using transcriptomic biomarkers. JOURNAL OF HAZARDOUS MATERIALS 2025; 493:138240. [PMID: 40262316 DOI: 10.1016/j.jhazmat.2025.138240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 04/07/2025] [Accepted: 04/09/2025] [Indexed: 04/24/2025]
Abstract
This study explores the application of machine learning (ML) in identifying transcriptomic changes associated with pulmonary pathologies induced by titanium dioxide nanoparticles (TiO2-NPs). Such an approach significantly contributes to understanding the underlying mode-of-action of TiO2-NP inhalation and follows the European Chemicals Agency's recommendations on applying Novel Approach Methodologies designed for reducing animal studies. The lung gene expression profiles from mice exposed via single intratracheal instillations to TiO2-NPs with varying physicochemical properties on day 1, and day 28 post-exposure were analyzed to develop computational models for predicting the lung pathologies of rutile TiO2-NPs. More than 600 random forest models were generated and rigorously validated, leading to the identification of 17 high-quality models with an average accuracy of 0.95. These models link nanoparticle-deposited surface area, charge, and post-exposure sampling time with dysregulation in key genes, including serum amyloid Saa1 (59.7-fold increase), Saa3 (253.7-fold increase), and the cytokine Ccl2 (3.4-fold increase). These genes are strongly associated with lung inflammation and fibrosis, key pathological responses to nanomaterial exposure. The study highlights critical nanoparticle features that drive transcriptomic changes. Hierarchical clustering confirmed the mechanistic links between nanoparticle properties and transcriptomic changes. This study demonstrates ML's potential to integrate omics data for nanosafety, offering a robust framework for early detection of adverse effects. The models enable the prediction of gene expression changes based on nanoparticle features, aiding in potential Safe and Sustainable-by-design of nanomaterials.
Collapse
Affiliation(s)
- Viacheslav Muratov
- University of Gdansk, Faculty of Chemistry, Laboratory of Environmental Chemoinformatics, Wita Stwosza 63, Gdansk 80-308, Poland
| | - Karolina Jagiello
- University of Gdansk, Faculty of Chemistry, Laboratory of Environmental Chemoinformatics, Wita Stwosza 63, Gdansk 80-308, Poland; QSAR Lab Ltd., Trzy lipy 3, Gdansk 80-172, Poland.
| | - Alicja Mikolajczyk
- University of Gdansk, Faculty of Chemistry, Laboratory of Environmental Chemoinformatics, Wita Stwosza 63, Gdansk 80-308, Poland; QSAR Lab Ltd., Trzy lipy 3, Gdansk 80-172, Poland
| | | | - Sabina Halappanavar
- Environmental Health Science and Research Bureau, Health Canada, Ottawa, Ontario K1A 0K9, Canada; Department of Biology, University of Ottawa, Ontario, Canada
| | - Ulla Vogel
- The National Research Centre for the Working Environment, Copenhagen DK-2100, Denmark
| | - Tomasz Puzyn
- University of Gdansk, Faculty of Chemistry, Laboratory of Environmental Chemoinformatics, Wita Stwosza 63, Gdansk 80-308, Poland; QSAR Lab Ltd., Trzy lipy 3, Gdansk 80-172, Poland.
| |
Collapse
|
8
|
Chowdhury MNH, Bin Ibne Reaz M, Ali SHM, Crespo ML, Ahmad S, Salim GM, Haque F, Ordóñez LGG, Islam MJ, Mahdee TM, Zaman KS, Hemel MSK, Bhuiyan MAS. Deep learning for early detection of chronic kidney disease stages in diabetes patients: A TabNet approach. Artif Intell Med 2025; 166:103153. [PMID: 40347843 DOI: 10.1016/j.artmed.2025.103153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 04/03/2025] [Accepted: 05/01/2025] [Indexed: 05/14/2025]
Abstract
Chronic kidney disease (CKD) poses a significant risk for diabetes patients, often leading to severe complications. Early and accurate CKD stage detection is crucial for timely intervention. However, it remains challenging due to its asymptomatic progression, the oversight of routine CKD tests during diabetes checkups, and limited access to nephrologists. This study aimed to address these challenges by developing a multiclass CKD stage prediction model for diabetes patients using longitudinal data from the Chronic Renal Insufficiency Cohort (CRIC) study. A novel iterative backward feature selection strategy was employed to determine key predictors of the CKD stage. TabNet, an attention-based deep learning architecture, was used to build classification models in complete and simplified categories. The complete model used 31 features, including complex kidney biomarkers, while the simplified model used 15 features readily available from routine checkups. The performance of TabNet was compared against traditional tree-based ensemble methods (XGBoost, random forest, AdaBoost) and a multi-layer perceptron. Model-specific and model-agnostic explainable AI (XAI) techniques were applied to interpret model decisions, enhancing the transparency and clinical applicability of the proposed approach. The TabNet models demonstrated superior performance, achieving 94.06 % and 92.71 % accuracy in cross-validation for the complete and simplified models, respectively, and 91.00 % and 88.00 % accuracy on test sets. XAI analysis identified serum creatinine, cystatin C, sex, and age as the most influential factors in CKD stage classification. The proposed TabNet models offer a robust approach for early CKD severity detection in diabetes patients, potentially improving clinical decision-making and patient outcomes.
Collapse
Affiliation(s)
- Md Nakib Hayat Chowdhury
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia; Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST), Saidpur 5310, Nilphamari, Bangladesh
| | - Mamun Bin Ibne Reaz
- Faculty of Electronic Engineering and Technology, Universiti Malaysia Perlis (UniMAP), 02600 Arau, Perlis, Malaysia; Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia.
| | - Sawal Hamid Md Ali
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia; Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
| | - María Liz Crespo
- Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| | - Shamim Ahmad
- Department of Computer Science and Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh
| | - Ghassan Maan Salim
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia; Institute of Microengineering and Nanoelectronics, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
| | - Fahmida Haque
- Artificial Intelligence Resource, Molecular Imaging Branch, National Cancer Institute, Bethesda, MD, USA
| | | | - Md Johirul Islam
- Department of Physics, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Taher Muhammad Mahdee
- Department of Computer Science and Engineering, Bangladesh Army University of Science and Technology (BAUST), Saidpur 5310, Nilphamari, Bangladesh
| | - Kh Shahriya Zaman
- Faculty of Electronic Engineering and Technology, Universiti Malaysia Perlis (UniMAP), 02600 Arau, Perlis, Malaysia
| | - Md Shahriar Khan Hemel
- Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM, Bangi 43600, Selangor, Malaysia
| | - Mohammad Arif Sobhan Bhuiyan
- Department of Electrical and Electronics Engineering, Xiamen University Malaysia, Bandar Sunsuria, Sepang 43900, Selangor, Malaysia
| |
Collapse
|
9
|
Velez T, Ibrahim Z, Duru K, Velez D, Triantafyllou M, McKinley K, Saif P, Kratimenos P, Clark A, Koutroulis I. Predicting hospital admissions, ICU utilization, and prolonged length of stay among febrile pediatric emergency department patients using incomplete and imbalanced electronic health record (EHR) data strategies. Int J Med Inform 2025; 200:105905. [PMID: 40203463 DOI: 10.1016/j.ijmedinf.2025.105905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2024] [Revised: 03/09/2025] [Accepted: 03/30/2025] [Indexed: 04/11/2025]
Abstract
OBJECTIVE Determine the efficacy of commonly used approaches to handling missing and/or imbalanced Electronic Health Record (EHR) data on the performance of predictive models targeting risk of admission, intensive care unit (ICU) use, or prolonged length of stay (PLOS) among presenting febrile pediatric emergency department (ED) patients. MATERIALS AND METHODS Historical ED EHR data was used to train a series of XGBoost (XGB) and logistic regression (LR) classifiers. Data handling strategies included imputation methods (multiple imputation (MI), median imputation, complete case (CC) analysis), and imbalanced data corrections (minority oversampling, stratified sub-group analysis). Model performance was evaluated using discriminative (AUC, AUPRC) and calibration metrics (Brier score, Z-scores, p-values). RESULTS Among the study population, 34 % were admitted, 2 % utilized the ICU, and 7 % had a PLOS. Significant data missingness was observed and determined to be not at random (MNAR). In predicting admissions using data recorded within the first two hours of presentation, LR trained using full cohort with median imputation was comparable to MI yielding well-calibrated admissions models with an AUC/AUPRC of 0.82/0.73 while CC analysis yielded an AUC/AUPRC of 0.76/0.78. XGB, trained with unimputed data, produced a well-calibrated admissions classifier with an AUC/AUPRC of 0.85/0.78. In contrast, imbalanced data correction techniques, including synthetic minority oversampling (SMOTE), risk stratification, or the use of XGB did not significantly improve the poor AUPRC and calibration performance of LR models predicting ICU and PLOS. CONCLUSION Both XGB and LR with median imputation demonstrated robust performance in predicting admissions in the presence of missing data. However, deriving clinically useful models for rare outcomes, such as ICU use or PLOS, remains a challenge due to poor precision/recall and calibration performance. Further research is needed to improve the prediction of rare outcomes in this population.
Collapse
Affiliation(s)
- Tom Velez
- Computer Technology Associates, Cardiff, CA, United States
| | - Zara Ibrahim
- Department of Pediatrics, Children's National Hospital, Washington, DC, United States
| | - Kanayo Duru
- Department of Pediatrics, Children's National Hospital, Washington, DC, United States; Brown University, Providence, RI, United States
| | - Dante Velez
- Department of Pediatrics, Children's National Hospital, Washington, DC, United States
| | - Maria Triantafyllou
- Center for Genetic Medicine Research, Children's National Research Institute, Washington, DC, United States
| | - Kenneth McKinley
- Department of Pediatrics, Children's National Hospital, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States
| | - Pasha Saif
- Virginia Tech Carilion School of Medicine, Roanoke, VA, United States
| | - Panagiotis Kratimenos
- Department of Pediatrics, Children's National Hospital, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States
| | - Andy Clark
- Computer Technology Associates, Cardiff, CA, United States
| | - Ioannis Koutroulis
- Department of Pediatrics, Children's National Hospital, Washington, DC, United States; Center for Genetic Medicine Research, Children's National Research Institute, Washington, DC, United States; George Washington University School of Medicine and Health Sciences, Washington, DC, United States.
| |
Collapse
|
10
|
Huang T, Li Q, Xu C, Gao J, Li Z, Zhang S. Revisiting low-homophily for graph-based fraud detection. Neural Netw 2025; 188:107407. [PMID: 40157230 DOI: 10.1016/j.neunet.2025.107407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 02/07/2025] [Accepted: 02/19/2025] [Indexed: 04/01/2025]
Abstract
The openness of Internet stimulates a large number of fraud behaviors which have become a huge threat. Graph-based fraud detectors have attracted extensive interest since the abundant structure information of graph data has proved effective. Conventional Graph Neural Network (GNN) approaches reveal fraudsters based on the homophily assumption. But fraudsters typically generate heterophilous connections and label-imbalanced neighborhood. Such behaviors deteriorate the performance of GNNs in fraud detection tasks due to the low homophily in graphs. Though some recent works have noticed the challenges, they either treat the heterophilous connections as homophilous ones or tend to reduce heterophily, which roughly ignore the benefits from heterophily. In this work, an integrated two-strategy framework HeteGAD is proposed to balance both homophily and heterophily information from neighbors. The key lies in explicitly shrinking intra-class distance and increasing inter-class segregation. Specifically, the Heterophily-aware Aggregation Strategy tease out the feature disparity on heterophilous neighbors and augment the disparity between representations with different labels. And the Homophily-aware Aggregation Strategy are devised to capture the homophilous information in global text and augment the representation similarity with the same label. Finally, two corresponding inter-relational attention mechanisms are incorporated to refine the procedure of modeling the interaction of multiple relations. Experiments are conducted to evaluate the proposed method with two real-world datasets, and demonstrate that the HeteGAD outperforms 11 state-of-the-art baselines for fraud detection.
Collapse
Affiliation(s)
- Tairan Huang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Qiutong Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Cong Xu
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jianliang Gao
- School of Computer Science and Engineering, Central South University, Changsha, China.
| | - Zhao Li
- Zhejiang University, Hangzhou, China
| | - Shichao Zhang
- School of Computer Science and Engineering Guangxi Normal University, Guilin, China
| |
Collapse
|
11
|
Lu W, Yao L, Wang Y, Li F, Zhou B, Ming W, Jiang Y, Liu X, Liu Y, Sun X, Wang Y, Bai Y. Characterization of extrachromosomal circular DNA associated with genomic repeat sequences in breast cancer. Int J Cancer 2025; 157:384-397. [PMID: 40135469 DOI: 10.1002/ijc.35423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 02/26/2025] [Accepted: 03/05/2025] [Indexed: 03/27/2025]
Abstract
Extrachromosomal circular DNA (eccDNA) has emerged as a potential biomarker for disease due to its stable closed circular structure. However, the diagnostic utility of eccDNA remains underexplored. In this study, we demonstrate that the characteristics of eccDNA associated with genomic repetitive elements change in breast cancer patient tissues and plasma. These changes can serve as signatures for accurate cancer classification. We profiled eccDNA annotated to repeat elements across the genome in tissues and plasma, aggregating each repeat element to the superfamily and subfamily level. Our findings indicate that eccDNA associated with repetitive elements in cancer exhibits regular patterns of enrichment or depletion in specific elements, particularly at the family level. Additionally, these repeat element changes are present in different subtypes of breast cancer, correlated with varying hormone receptor expression. Although there are differences in the landscapes of eccDNA on repetitive elements between cancer tissues and paired plasma, the unique characteristics of eccDNA associated with repetitive sequences in the plasma of cancer patients facilitate better differentiation from normal individuals. These analyses reveal that changes in eccDNA associated with repeat sequences in human cancers can be used as diagnostic biomarkers for cancer patients.
Collapse
Affiliation(s)
- Wenxiang Lu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Lingsong Yao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Ying Wang
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Fuyu Li
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Bingbo Zhou
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Wenlong Ming
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, China
| | - Yali Jiang
- The Friendship Hospital of Ili Kazakh Autonomous Prefecture, Ili & Jiangsu Joint Institute of Health, Yining, Xinjiang Uygur Autonomous Region, China
| | - Xiaoan Liu
- Department of Breast Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yun Liu
- Department of Information, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Xiao Sun
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Yan Wang
- The Friendship Hospital of Ili Kazakh Autonomous Prefecture, Ili & Jiangsu Joint Institute of Health, Yining, Xinjiang Uygur Autonomous Region, China
- Department of Endoscopy, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Yunfei Bai
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
12
|
Sharan RV, Xiong H. Wet and dry cough classification using cough sound characteristics and machine learning: A systematic review. Int J Med Inform 2025; 199:105912. [PMID: 40203586 DOI: 10.1016/j.ijmedinf.2025.105912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 03/10/2025] [Accepted: 04/03/2025] [Indexed: 04/11/2025]
Abstract
BACKGROUND Distinguishing between productive (wet) and non-productive (dry) cough types is important for evaluating respiratory health, assisting in differential diagnosis, and monitoring disease progression. However, assessing cough type through the perception of cough sounds in clinical settings poses challenges due to its subjectivity. Employing objective cough sound analysis holds promise for aiding diagnostic assessments and guiding the management of respiratory conditions. This systematic review aims to assess and summarize the predictive capabilities of machine learning algorithms in analyzing cough sounds to determine cough type. METHOD A systematic search of the Scopus, Medline, and Embase databases conducted on March 8, 2025, yielded three studies that met the inclusion criteria. The quality assessment of these studies was conducted using the checklist for the assessment of medical artificial intelligence (ChAMAI). RESULTS The inter-rater agreement for annotating wet and dry coughs ranged from 0.22 to 0.81 across the three studies. Furthermore, these studies employed diverse inputs for their machine learning algorithms, including different cough sound features and time-frequency representations. The algorithms used ranged from conventional classifiers like logistic regression to neural networks. While the classification accuracy for identifying wet and dry coughs ranged from 78% to 87% across these studies, none of them assessed their algorithms through external validation. CONCLUSION The high variability in inter-rater agreement highlights the subjectivity in manually interpreting cough sounds and underscores the need for objective cough sound analysis methods. The predictive ability of cough-type classification algorithms shows promise in the small number of studies analyzed in this systematic review. However, more studies are needed, particularly those validating their models on independent and external datasets.
Collapse
Affiliation(s)
- Roneel V Sharan
- School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, United Kingdom.
| | - Hao Xiong
- Australian Institute of Health Innovation, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
13
|
Araújo ALD, Sperandio M, Calabrese G, Faria SS, Cardenas DAC, Martins MD, Saldivia-Siracusa C, Giraldo-Roldán D, Pedroso CM, Vargas PA, Lopes MA, Santos-Silva AR, Kowalski LP, Moraes MC. Artificial intelligence in healthcare applications targeting cancer diagnosis-part I: data structure, preprocessing and data organization. Oral Surg Oral Med Oral Pathol Oral Radiol 2025; 140:79-88. [PMID: 39893121 DOI: 10.1016/j.oooo.2025.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 12/12/2024] [Accepted: 01/03/2025] [Indexed: 02/04/2025]
Abstract
BACKGROUND Machine learning techniques hold significant potential to support the diagnosis and prognosis of diseases. However, the success of these approaches is heavily dependent on rigorous data acquisition, preprocessing and data organization. METHODS This article reviews the literature to evaluate key factors in dataset construction, focusing on data structure, preprocessing, and data organization, particularly in the context of imaging data. RESULTS The main issues with data construction when dealing with medical applications are noise (incorrect or irrelevant data), sparsity/ limited availability, representativeness/variability, and data imbalance (uneven class distribution).While preprocessing steps prepare the data to be suitable for the models, data organization focuses in improving data arranging to increase the model performance. Additionally, the impact of CNN complexity in processing balanced, imbalanced, and complex datasets shows that complex CNNs are not always the optimal choice for every classification problem. CONCLUSION By integrating knowledge from Health Sciences and Biomedical Engineering, we aim to enhance healthcare professionals' understanding of machine learning for image analysis in Oral Medicine and Pathology. This encourages their involvement in patient recruitment and data acquisition, broadening their roles and significantly contributing to the creation of well-characterized datasets for future research and applications.
Collapse
Affiliation(s)
- Anna Luíza Damaceno Araújo
- Head and Neck Surgery Department, University of São Paulo Medical School, São Paulo, State of São Paulo, Brazil; Hospital Israelita Albert Einstein, São Paulo, Brazil.
| | - Marcelo Sperandio
- Department of Oral Medicine and Pathology, Faculdade São Leopoldo Mandic, Research Institute, Campinas, São Paulo, Brazil
| | - Giovanna Calabrese
- Institute of Science and Technology (ICT-UNIFESP), Federal University of São Paulo, São José dos Campos, São Paulo, Brazil
| | - Sarah S Faria
- Institute of Science and Technology (ICT-UNIFESP), Federal University of São Paulo, São José dos Campos, São Paulo, Brazil
| | - Diego Armando Cardona Cardenas
- Institute of Science and Technology (ICT-UNIFESP), Federal University of São Paulo, São José dos Campos, São Paulo, Brazil; Heart Institute, University of São Paulo, São Paulo, State of São Paulo, Brazil
| | - Manoela Domingues Martins
- Department of Oral Pathology, School of Dentistry, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Cristina Saldivia-Siracusa
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba, São Paulo, Brazil
| | - Daniela Giraldo-Roldán
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba, São Paulo, Brazil
| | - Caique Mariano Pedroso
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba, São Paulo, Brazil
| | - Pablo Agustin Vargas
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba, São Paulo, Brazil
| | - Marcio Ajudarte Lopes
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba, São Paulo, Brazil
| | - Alan Roger Santos-Silva
- Faculdade de Odontologia de Piracicaba, Universidade Estadual de Campinas (FOP-UNICAMP), Piracicaba, São Paulo, Brazil
| | - Luiz Paulo Kowalski
- Head and Neck Surgery Department, University of São Paulo Medical School, São Paulo, State of São Paulo, Brazil; Department of Head and Neck Surgery and Otorhinolaryngology, A.C. Camargo Cancer Center, São Paulo, State of São Paulo, Brazil
| | - Matheus Cardoso Moraes
- Institute of Science and Technology (ICT-UNIFESP), Federal University of São Paulo, São José dos Campos, São Paulo, Brazil
| |
Collapse
|
14
|
J Hayes C, Bin Noor N, Raciborski RA, C Martin B, J Gordon A, J Hoggatt K, Hudson T, A Cucciare M. Development and validation of machine-learning algorithms predicting retention, overdoses, and all-cause mortality among US military veterans treated with buprenorphine for opioid use disorder. J Addict Dis 2025; 43:207-224. [PMID: 38946144 DOI: 10.1080/10550887.2024.2363035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
BACKGROUND Buprenorphine for opioid use disorder (B-MOUD) is essential to improving patient outcomes; however, retention is essential. OBJECTIVE To develop and validate machine-learning algorithms predicting retention, overdoses, and all-cause mortality among US military veterans initiating B-MOUD. METHODS Veterans initiating B-MOUD from fiscal years 2006-2020 were identified. Veterans' B-MOUD episodes were randomly divided into training (80%;n = 45,238) and testing samples (20%;n = 11,309). Candidate algorithms [multiple logistic regression, least absolute shrinkage and selection operator regression, random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)] were used to build and validate classification models to predict six binary outcomes: 1) B-MOUD retention, 2) any overdose, 3) opioid-related overdose, 4) overdose death, 5) opioid overdose death, and 6) all-cause mortality. Model performance was assessed using standard classification statistics [e.g., area under the receiver operating characteristic curve (AUC-ROC)]. RESULTS Episodes in the training sample were 93.0% male, 78.0% White, 72.3% unemployed, and 48.3% had a concurrent drug use disorder. The GBM model slightly outperformed others in predicting B-MOUD retention (AUC-ROC = 0.72). RF models outperformed others in predicting any overdose (AUC-ROC = 0.77) and opioid overdose (AUC-ROC = 0.77). RF and GBM outperformed other models for overdose death (AUC-ROC = 0.74 for both), and RF and DNN outperformed other models for opioid overdose death (RF AUC-ROC = 0.79; DNN AUC-ROC = 0.78). RF and GBM also outperformed other models for all-cause mortality (AUC-ROC = 0.76 for both). No single predictor accounted for >3% of the model's variance. CONCLUSIONS Machine-learning algorithms can accurately predict OUD-related outcomes with moderate predictive performance; however, prediction of these outcomes is driven by many characteristics.
Collapse
Affiliation(s)
- Corey J Hayes
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Institute for Digital Health and Innovation, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Center for Mental Healthcare and Outcomes Research, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
| | - Nahiyan Bin Noor
- Institute for Digital Health and Innovation, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Rebecca A Raciborski
- Center for Mental Healthcare and Outcomes Research, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
- Behavioral Health Quality Enhancement Research Initiative, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
- Evidence, Policy, and Implementation Center, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
| | - Bradley C Martin
- Division of Pharmaceutical Evaluation and Policy, College of Pharmacy, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Adam J Gordon
- Program for Addiction Research, Clinical Care, Knowledge, and Advocacy (PARCKA), Division of Epidemiology, Department of Medicine, School of Medicine, University of Utah, Salt Lake City, UT, USA
- Informatics, Decision-Enhancement and Analytic Sciences (IDEAS) Center, VA Salt Lake City Healthcare System, Salt Lake City, UT, USA
| | - Katherine J Hoggatt
- San Francisco VA Medical Center, San Francisco, CA, USA
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Teresa Hudson
- Center for Mental Healthcare and Outcomes Research, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
- Center for Health Services Research, Department of Psychiatry, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Department of Emergency Medicine, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Michael A Cucciare
- Center for Mental Healthcare and Outcomes Research, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
- Center for Health Services Research, Department of Psychiatry, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Veterans Affairs South Central Mental Illness Research, Education and Clinical Center, Central Arkansas Veterans Healthcare System, North Little Rock, AR, USA
| |
Collapse
|
15
|
Frechman E, Jaeger BC, Kowalkowski M, Williamson JD, Lenoir KM, Palakshappa JA, Wells BJ, Callahan KE, Pajewski NM, Gabbard JL. External validation of a proprietary risk model for 1-year mortality in community-dwelling adults aged 65 years or older. J Am Med Inform Assoc 2025; 32:1110-1119. [PMID: 40298901 DOI: 10.1093/jamia/ocaf062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 03/16/2025] [Accepted: 04/08/2025] [Indexed: 04/30/2025] Open
Abstract
OBJECTIVE To examine the discrimination, calibration, and algorithmic fairness of the Epic End of Life Care Index (EOL-CI). MATERIALS AND METHODS We assessed the EOL-CI's performance by estimating area under the receiver operating characteristic curve (AUC), sensitivity, and positive and negative predictive values in community-dwelling adults ≥65 years of age in a single health system in the Southeastern United States. Algorithmic fairness was examined by comparing the model's performance across sex, race, and ethnicity subgroups. Using a machine learning approach, we also explored local re-calibration of the EOL-CI considering additional information on past hospitalizations and frailty. RESULTS Among 215 731 patients (median age = 74 years, 57% female, 12% of Black race), 10% were classified as medium risk (15-44) and 3% as high risk (≥45) by the EOL-CI. The observed 1-year mortality rate was 3%. The EOL-CI had an AUC 0.82 for 1-year mortality, with a positive predictive value of 22%. Predictive performance was generally similar across sex and race subgroups, though the EOL-CI displayed better performance with increasing age and in older adults with 2 or more outpatient encounters in the past 24 months. Local re-calibration of the EOL-CI was required to provide absolute estimates of mortality risk, and calibration was further improved when the EOL-CI was augmented with data on inpatient hospitalizations and frailty. DISCUSSION The EOL-CI demonstrates reasonable discrimination, albeit with better performance in older adults and in those with greater health system contact. CONCLUSION Local refinement and calibration of the EOL-CI score is required to provide direct estimates of prognosis, with the goal of making the EOL-CI a more a valuable tool at the point of care for identifying patients who would benefit from targeted palliative care interventions and proactive care planning.
Collapse
Affiliation(s)
- Erica Frechman
- Section on Gerontology and Geriatric Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Byron C Jaeger
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Marc Kowalkowski
- Section on Hospital Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Atrium Heath, Charlotte, NC, Charlotte, NC 28203, United States
| | - Jeff D Williamson
- Section on Gerontology and Geriatric Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Kristin M Lenoir
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Atrium Heath, Charlotte, NC, Charlotte, NC 28203, United States
| | - Jessica A Palakshappa
- Section on Pulmonary, Critical Care, Allergy, and Immunologic Diseases, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, United States
| | - Brian J Wells
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Kathryn E Callahan
- Section on Gerontology and Geriatric Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Nicholas M Pajewski
- Department of Biostatistics and Data Science, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Atrium Heath, Charlotte, NC, Charlotte, NC 28203, United States
| | - Jennifer L Gabbard
- Section on Gerontology and Geriatric Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
- Center for Health System Sciences (CHASSIS), Wake Forest University School of Medicine, Atrium Heath, Charlotte, NC, Charlotte, NC 28203, United States
| |
Collapse
|
16
|
Fathy W, Emeriaud G, Cheriet F. A comprehensive review of ICU readmission prediction models: From statistical methods to deep learning approaches. Artif Intell Med 2025; 165:103126. [PMID: 40300338 DOI: 10.1016/j.artmed.2025.103126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/04/2024] [Accepted: 03/29/2025] [Indexed: 05/01/2025]
Abstract
The prediction of Intensive Care Unit (ICU) readmission has become a crucial area of research due to the increasing demand for ICU resources and the need to provide timely interventions to critically ill patients. In recent years, several studies have explored the use of statistical, machine learning (ML), and deep learning (DL) models to predict ICU readmission. This review paper presents an extensive overview of these studies and discusses the challenges associated with ICU readmission prediction. We categorize the studies based on the type of model used and evaluate their strengths and limitations. We also discuss the performance metrics used to evaluate the models and their potential clinical applications. In addition, this review explores current methodologies, data usage, and recent advances in interpretability and explainable AI for medical applications, offering insights to guide future research and development in this field. Finally, we identify gaps in the current literature and provide recommendations for future research. Recent advances like ML and DL have moderately improved the prediction of the risk of ICU readmission. However, more progress is needed to reach the precision required to build computerized decision support tools.
Collapse
Affiliation(s)
- Waleed Fathy
- Department of Computer and Software Engineering, Polytechnique Montréal, Montreal, Quebec, Canada; Department of Electronic and Communication Engineering, Zagazig Univeristy, Zagazig, Sharkia, Egypt.
| | - Guillaume Emeriaud
- Department of Pediatrics, CHU Sainte-Justine, Université de Montréal, Montreal, Quebec, Canada.
| | - Farida Cheriet
- Department of Computer and Software Engineering, Polytechnique Montréal, Montreal, Quebec, Canada.
| |
Collapse
|
17
|
Kim SY, Jin JJ, Ha A, Song CH, Park SH, Kang KH, Lee J, Huh MG, Jeoung JW, Park KH, Kim YK. SMOTE-Enhanced Explainable Artificial Intelligence Model for Predicting Visual Field Progression in Myopic Normal Tension Glaucoma. J Glaucoma 2025; 34:520-527. [PMID: 40249240 DOI: 10.1097/ijg.0000000000002579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Accepted: 04/05/2025] [Indexed: 04/19/2025]
Abstract
PRCIS The AI model, enhanced by SMOTE to balance data classes, accurately predicted visual field deterioration in patients with myopic normal tension glaucoma. Using SHAP analysis, the key variables driving disease progression were identified. PURPOSE To develop and validate a Synthetic Minority Over-sampling Technique (SMOTE)-enhanced artificial intelligence (AI) model for predicting visual field progression in myopic normal tension glaucoma (NTG) patients. METHODS This retrospective cohort study included 100 eyes from myopic NTG patients with a mean follow-up of 10.3±3.2 years. Baseline parameters included intraocular pressure (IOP), central corneal thickness, axial length, and visual field metrics. A SMOTE-enhanced AI model was created to address class imbalance in progression events. Model performance was evaluated using receiver operating characteristic (ROC) analysis, cross-validation, and calibration plots. Predictive factor importance was evaluated through SHapley Additive exPlanations (SHAP) analysis. RESULTS Visual field progression was observed in 28% of patients, with a median progression time of 3.2 years. The AI model achieved an area under the ROC curve (AUC) of 0.83 (95% CI, 0.75-0.91), with promising sensitivity (0.81) and specificity (0.77). SHAP analysis identified baseline mean deviation (MD), age, axial length, baseline IOP, and visual field index (VFI) as key predictors. When patients were stratified based on model-predicted risk scores, those with scores above 0.8 had significantly higher observed progression rates (82.6%) compared with those with lower risk scores. Subgroup analysis revealed strong correlations between progression risks and older age, greater axial length, and worse baseline MD. CONCLUSIONS The SMOTE-enhanced AI model shows reasonable predictive performance and potential clinical utility for identifying visual field progression in myopic NTG patients, though further validation in larger cohorts is needed. By addressing class imbalance and myopia-specific challenges, this approach enables personalized risk stratification and early intervention.
Collapse
Affiliation(s)
- So Yeon Kim
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | | | - Ahnul Ha
- Department of Ophthalmology, Jeju National University Hospital
- Department of Ophthalmology, Jeju National University School of Medicine, Jeju
| | - Chae Hyun Song
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | - Se Hie Park
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | - Kyoung Hae Kang
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | - Jaekyoung Lee
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | - Min Gu Huh
- Department of Ophthalmology, Yeungnam University Medical Center, Daegu, Korea
| | - Jin Wook Jeoung
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | - Ki Ho Park
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
| | - Young Kook Kim
- Department of Ophthalmology, Seoul National University Hospital
- Department of Ophthalmology, Seoul National University College of Medicine
- Ranelagh Centre for Biosocial Informatics, Seoul National University College of Medicine, Seoul
| |
Collapse
|
18
|
McClure Z, Greenwood CJ, Fuller-Tyszkiewicz M, Messer M, Linardon J. Predicting responsiveness to a dialectical behaviour therapy skills training app for recurrent binge eating: A machine learning approach. Behav Res Ther 2025; 190:104755. [PMID: 40286685 DOI: 10.1016/j.brat.2025.104755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 03/13/2025] [Accepted: 04/21/2025] [Indexed: 04/29/2025]
Abstract
OBJECTIVE Smartphone applications (apps) show promise as an effective and scalable intervention modality for disordered eating, yet responsiveness varies considerably. The ability to predict user responses to app-based interventions is currently limited. Machine learning (ML) techniques have shown potential to improve prediction of complex clinical outcomes. We applied ML techniques to predict responsiveness to a dialectical behaviour therapy-based smartphone app for recurrent binge eating. METHOD Data were collected as part of a randomised controlled trial (RCT). The present sample was based on data from 576 participants with recurrent binge eating. 10 common classification and regression approaches were used to predict outcomes that represent key stages of the user experience, including initial intervention uptake, app adherence, study drop-out, and symptom change. Models were developed using 69 self-reported baseline variables (i.e., demographic, clinical, psychological) and several app usage variables (i.e., number of modules completed) as predictors. RESULTS All models, using only baseline predictors, performed sub-optimally at predicting engagement (AUCs = 0.48-0.61; R2 = 0.00-0.04) and symptom level change (R2 = 0.00-0.07). Incorporating usage data improved prediction of study dropout (AUC = 0.69-0.76). CONCLUSION ML models were unable to accurately predict responsiveness using self-reported baseline predictors alone. Predicting outcomes with greater precision may require consideration of how predictors change over time and interact with a user's context. Modelling usage pattern data appears to improve prediction of dropout, highlighting the potential value of tracking intervention usage to identify individuals at risk of disengagement.
Collapse
Affiliation(s)
- Zoe McClure
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, VIC, 3220, Australia.
| | - Christopher J Greenwood
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, VIC, 3220, Australia; SEED Lifespan Strategic Research Centre, Deakin University, Burwood, Victoria, Australia; Centre for Adolescent Health, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia; University of Melbourne, Department of Paediatrics, Royal Children's Hospital, Melbourne, Australia
| | - Matthew Fuller-Tyszkiewicz
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, VIC, 3220, Australia; SEED Lifespan Strategic Research Centre, Deakin University, Burwood, Victoria, Australia
| | - Mariel Messer
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, VIC, 3220, Australia
| | - Jake Linardon
- School of Psychology, Deakin University, 1 Gheringhap Street, Geelong, VIC, 3220, Australia; SEED Lifespan Strategic Research Centre, Deakin University, Burwood, Victoria, Australia
| |
Collapse
|
19
|
Whalen E, Gilbert S, Buchanan J. Applying Aggregate Statistical Analyses to Safety Monitoring of Ongoing Clinical Studies, Issues, and Opportunities in a Test Case. Ther Innov Regul Sci 2025; 59:643-649. [PMID: 40348902 DOI: 10.1007/s43441-025-00776-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Accepted: 04/07/2025] [Indexed: 05/14/2025]
Affiliation(s)
- Ed Whalen
- Pfizer Inc, 66 Hudson Boulevard, New York, NY, 10001, USA.
| | - Steven Gilbert
- Pfizer Inc, 66 Hudson Boulevard, New York, NY, 10001, USA
| | | |
Collapse
|
20
|
Akbulut C, Bird G. Who Tweets for the autistic community? A natural language processing-driven investigation. AUTISM : THE INTERNATIONAL JOURNAL OF RESEARCH AND PRACTICE 2025; 29:1740-1753. [PMID: 40130705 PMCID: PMC12159347 DOI: 10.1177/13623613251325934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
The formation of autism advocacy organisations led by family members of autistic individuals led to intense criticism from some parts of the autistic community. In response to what was perceived as a misrepresentation of their interests, autistic individuals formed autistic self-advocacy groups, adopting the philosophy that autism advocacy should be led 'by' autistic people 'for' autistic people. However, recent claims that self-advocacy organisations represent only a narrow subset of the autistic community have prompted renewed debate surrounding the role of organisations in autism advocacy. While many individuals and groups have outlined their views, the debate has yet to be studied through computational means. In this study, we apply machine learning and natural language processing techniques to a large-scale collection of Tweets from organisations and individuals in autism advocacy. We conduct a specification curve analysis on the similarity of language across organisations and individuals, and find evidence to support claims of partial representation relevant to both self-advocacy groups and organisations led by non-autistic people. In introducing a novel approach to studying the long-standing conflict between different groups in the autism advocacy community, we hope to provide both organisations and individuals with new tools to help ground discussions of representation in empirical insight.Lay AbstractSome autism advocacy organisations are run by family members of autistic people, and claim to speak on behalf of autistic people. These organisations have been criticised by autistic people, who feel like autism charities do not adequately represent their true interests. In response to these organisations, autistic people have come together to form autistic self-advocacy organisations, or groups in which activists can spread awareness of autism from an autistic point-of-view. However, some people say that autistic self-advocacy organisations do not sufficiently represent the needs of all autistic people. These tensions between organisations and individuals have made it difficult to determine which organisations can make the claim that they represent all autism advocates individuals equally, instead of showing preference to a sub-group within the autism community. In this study, we try to approach this issue using computational tools to see if, in their Twitter posts, both kinds of organisations show a preference for the interests of autistic people or parents of autistic children. We do so by comparing a large body of Tweets by organisations to Tweets by autistic people and parents of autistic children. We find that both kinds of organisations match the interests of one group of autism advocates better than the other. The insight we provide has the potential to inspire new conversations and solutions to a long-standing conflict in autism advocacy.
Collapse
Affiliation(s)
| | - Geoffrey Bird
- University of Oxford, UK
- University College London, UK
| |
Collapse
|
21
|
Islam MM, Liu J, Chakraborty R, Das S. Evaluating crash risk factors of farm equipment vehicles on county and non-county roads using interpretable tabular deep learning (TabNet). ACCIDENT; ANALYSIS AND PREVENTION 2025; 217:108048. [PMID: 40252392 DOI: 10.1016/j.aap.2025.108048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 03/27/2025] [Accepted: 04/12/2025] [Indexed: 04/21/2025]
Abstract
Crashes involving farm equipment vehicles are a significant safety concern on public roads, particularly in rural and agricultural regions. These vehicles display unique challenges due to their slow-moving operational speed and interactions with faster vehicles, often leading to severe crashes. This study analyzed crashes involving farm equipment vehicles to examine the factors influencing crash severity, with a particular focus on comparing incidents on county roads to those on non-county roads. The dataset included key variables such as road geometry, lighting conditions, and traffic interactions, with preprocessing techniques like Synthetic Minority Over-sampling Technique (SMOTE) applied to address class imbalance. The TabNet model, a tabular deep learning model, was employed to analyze crash dynamics, offering both predictive accuracy and interpretability through feature importance and SHapley Additive exPlanations (SHAP) plots. Findings revealed that crash severity on county roads is primarily influenced by crash speed limit, first harmful event, traffic control, and person age, reflecting the role of road geometry and demographic risk in rural settings. In contrast, non-county roads were more affected by lighting conditions, intersection-related features, and population group, emphasizing the impact of visibility and traffic complexity in urban areas. Speed limit consistently emerged as a critical factor across all road types and severity levels. The study emphasized the need for targeted safety interventions, including visibility enhancements, speed management, and enhanced education campaigns for county and non-county areas.
Collapse
Affiliation(s)
- Md Monzurul Islam
- Texas State University, 601 University Drive, San Marcos, TX 78666, USA.
| | - Jinli Liu
- Texas State University, 601 University Drive, San Marcos, TX 78666, USA.
| | - Rohit Chakraborty
- Texas State University, 601 University Drive, San Marcos, TX 78666, USA.
| | - Subasish Das
- Texas State University, 601 University Drive, San Marcos, TX 78666, USA.
| |
Collapse
|
22
|
Li H, Mao Y, Xu Y, Tu K, Zhang H, Gu R, Sun Q. Rapid detection of the viability of naturally aged maize seeds using multimodal data fusion and explainable deep learning techniques. Food Chem 2025; 478:143692. [PMID: 40068265 DOI: 10.1016/j.foodchem.2025.143692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 02/18/2025] [Accepted: 02/28/2025] [Indexed: 04/06/2025]
Abstract
Seed viability, a key indicator for quality assessment, directly impacts the emergence of field seedlings. The existing nondestructive testing model for maize seed vitality based on naturally aged seeds and predominantly relying on single-modal data like MV and RS, achieves an accuracy of less than 70 %. To elucidate the influence of different data on model accuracy, this study proposes the MSCNSVN model for detecting seed viability by collecting multisensor information from maize seeds using sensors, such as MV, RS, TS, FS, and SS. Our findings indicated that (1) the single-modal FS dataset achieved optimal prediction accuracy, with FS570/600 contributing the most; (2) multimodal data fusion outperformed single-modal data, with an accuracy improvement of 10 %, while the MV + RS + FS dataset achieved the highest accuracy; (3) the MSCNSVN model demonstrated superior performance compared to baseline models; (4) modeling with dual-variety datasets and endosperm surface datasets improved accuracy by 2 %-3 %.
Collapse
Affiliation(s)
- He Li
- College of Agronomy and Biotechnology, China Agricultural University/ The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research of Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, Beijing 100193, China
| | - Yilin Mao
- College of Agronomy and Biotechnology, China Agricultural University/ The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research of Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, Beijing 100193, China
| | - Yanan Xu
- College of Agronomy and Biotechnology, China Agricultural University/ The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research of Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, Beijing 100193, China
| | - Keling Tu
- Jiangsu Key Laboratory of Crop Genetics and Physiology, Key Laboratory of Plant Functional Genomics of the Ministry of Education, Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding (Agricultural College of Yangzhou University), Research Institute of Smart Agriculture (Agricultural College of Yangzhou University), Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, Yangzhou University, Yangzhou 225009, China
| | - Han Zhang
- College of Agronomy and Biotechnology, China Agricultural University/ The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research of Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, Beijing 100193, China
| | - Riliang Gu
- College of Agronomy and Biotechnology, China Agricultural University/ The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research of Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, Beijing 100193, China
| | - Qun Sun
- College of Agronomy and Biotechnology, China Agricultural University/ The Innovation Center (Beijing) of Crop Seeds Whole-Process Technology Research of Ministry of Agriculture and Rural Affairs/Beijing Key Laboratory of Crop Genetic Improvement, Beijing 100193, China..
| |
Collapse
|
23
|
Hrtonova V, Jaber K, Nejedly P, Blackwood ER, Klimes P, Frauscher B. The class imbalance problem in automatic localization of the epileptogenic zone for epilepsy surgery: a systematic review. J Neural Eng 2025; 22:031002. [PMID: 40489993 DOI: 10.1088/1741-2552/ade28c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 06/08/2025] [Indexed: 06/11/2025]
Abstract
Objective.Accurate localization of the epileptogenic zone (EZ) is crucial for epilepsy surgery, but the class imbalance of epileptogenic vs. non-epileptogenic electrode contacts in intracranial electroencephalography (iEEG) data poses significant challenges for automatic localization methods. This review evaluates methodologies for handling the class imbalance in EZ localization studies that use machine learning (ML).Approach.We systematically reviewed studies employing ML to localize the EZ from iEEG data, focusing on strategies for addressing class imbalance in data handling, algorithm design, and evaluation.Results.Out of 2,128 screened studies, 35 fulfilled the inclusion criteria. Across the studies, the iEEG contacts annotated as epileptogenic prior to automatic localization constituted a median of 18.34% of all contacts. However, many of these studies did not adequately address the class imbalance problem. Techniques such as data resampling and cost-sensitive learning were used to mitigate the class imbalance problem, but the chosen evaluation metrics often failed to account for it.Significance.Class imbalance significantly impacts the reliability of EZ localization models. More comprehensive management and innovative approaches are needed to enhance the robustness and clinical utility of these models. Addressing class imbalance in ML models for EZ localization will improve both the predictive performance and reliability of these models.
Collapse
Affiliation(s)
- Valentina Hrtonova
- First Department of Neurology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Institute of Scientific Instruments of the CAS, Brno, Czech Republic
- Analytical Neurophysiology Lab, Department of Neurology, Duke University Medical Center, Durham, NC, United States of America
| | - Kassem Jaber
- Analytical Neurophysiology Lab, Department of Neurology, Duke University Medical Center, Durham, NC, United States of America
- Department of Biomedical Engineering, Duke Pratt School of Engineering, Durham, NC, United States of America
| | - Petr Nejedly
- First Department of Neurology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Institute of Scientific Instruments of the CAS, Brno, Czech Republic
| | - Elizabeth R Blackwood
- Duke University Medical Center Library & Archives, Durham, NC, United States of America
| | - Petr Klimes
- Institute of Scientific Instruments of the CAS, Brno, Czech Republic
| | - Birgit Frauscher
- Analytical Neurophysiology Lab, Department of Neurology, Duke University Medical Center, Durham, NC, United States of America
- Department of Biomedical Engineering, Duke Pratt School of Engineering, Durham, NC, United States of America
| |
Collapse
|
24
|
Aljurbua R, Alshehri J, Gupta S, Alharbi A, Obradovic Z. Leveraging multi-modal data for early prediction of severity in forced transmission outages with hierarchical spatiotemporal multiplex networks. PLoS One 2025; 20:e0326752. [PMID: 40560937 DOI: 10.1371/journal.pone.0326752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2025] [Accepted: 06/04/2025] [Indexed: 06/28/2025] Open
Abstract
Extended power transmission outages caused by weather events can significantly impact the economy, infrastructure, and residents' quality of life in affected regions. One of the challenges is providing early, accurate warnings for these disruptions. To address this challenge, we introduce HMN-RTS, a hierarchical multiplex network designed to predict the duration of a forced transmission outage by leveraging a multi-modal approach. We investigate outage duration prediction over two years at the county level, focusing on the states of the Pacific Northwest region, including Idaho, California, Montana, Washington, and Oregon. The multiplex network layers collect diverse data sources, including information about power outages, weather data, weather forecasts, lightning, land cover, transmission lines, and social media. Our findings demonstrate that this approach enhances the accuracy of predicting power outage duration. The HMN-RTS model improves 3 hours ahead outage predictions, achieving a macro F1 score of 0.79 compared to the best alternative of 0.73 for a five-class classification. The HMN-RTS model provides valuable predictions of outage duration across multiple time horizons and seasons, enabling grid operators to implement timely outage mitigation strategies. Overall, the results underscore the HMN-RTS model's capability to deliver early and practical risk assessments.
Collapse
Affiliation(s)
- Rafaa Aljurbua
- Center for Data Analytics and Biomedical Informatics, Computer and Information Science Department, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Jumanah Alshehri
- Center for Data Analytics and Biomedical Informatics, Computer and Information Science Department, Temple University, Philadelphia, Pennsylvania, United States of America
- College of Business Administration, Imam Abdulrahman bin Faisal University, Dammam, Saudi Arabia
| | - Shelly Gupta
- Center for Data Analytics and Biomedical Informatics, Computer and Information Science Department, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Abdulrahman Alharbi
- Center for Data Analytics and Biomedical Informatics, Computer and Information Science Department, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Computer Science, College of Engineering and Computer Science, Jazan University, Jazan, Saudi Arabia
| | - Zoran Obradovic
- Center for Data Analytics and Biomedical Informatics, Computer and Information Science Department, Temple University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
25
|
Jin Z, Ferrada GA, Zhang D, Scovronick N, Fu JS, Chen K, Liu Y. Fire Smoke Elevated the Carbonaceous PM 2.5 Concentration and Mortality Burden in the Contiguous U.S. and Southern Canada. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:12196-12210. [PMID: 40504638 DOI: 10.1021/acs.est.5c01641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2025]
Abstract
Despite emerging evidence on the health impacts of fine particulate matter (PM2.5) from wildland fire smoke, the specific effects of PM2.5 composition on health outcomes remain uncertain. We developed a three-level, chemical transport model-based framework to estimate daily full-coverage concentrations of smoke-derived carbonaceous PM2.5, specifically organic carbon (OC) and elemental carbon (EC), at a 1 × 1 km2 spatial resolution from 2002 to 2019 across the contiguous U.S. (CONUS) and Southern Canada (SC). A 10-fold random cross-validation confirmed robust performance, with daily R2 = 0.77 (OC) and 0.80 (EC) in the smoke-off scenario and 0.67 (OC) and 0.71 (EC) in the smoke-on scenario, and exceeded 0.90 at the monthly scale after residual adjustment. Modeling results indicated that increases in wildland fire smoke have offset approximately one-third of the improvements in background air quality. In recent years, wildland fire smoke has become more frequent and carbonaceous PM2.5 concentrations have intensified, especially in the Western CONUS and Southwestern Canada. Wildfire season is also starting earlier and lengthens throughout the year, leading to more population being exposed. We estimated that long-term exposure to fire smoke carbonaceous PM2.5 is responsible for approximately 7455 and 259 non-accidental deaths annually in the CONUS and SC, respectively, with associated annual monetized damage of 68.3 billion USD for the CONUS and 1.9 billion CAD for SC. The Southeastern CONUS, where prescribed fires are prevalent, contributed most to these health impacts and monetized damages. Our findings offer critical insights to inform policy development and assess future health burdens associated with fire smoke exposure.
Collapse
Affiliation(s)
- Zhihao Jin
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Gonzalo A Ferrada
- Deparent of Civil and Environmental Engineering, University of Tennessee, Knoxville, Tennessee 37996, United States
- Global Systems Laboratory, NOAA Earth System Research Laboratories, Boulder, Colorado 80305, United States
| | - Danlu Zhang
- Deparent of Biostatistics, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Noah Scovronick
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| | - Joshua S Fu
- Deparent of Civil and Environmental Engineering, University of Tennessee, Knoxville, Tennessee 37996, United States
| | - Kai Chen
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, Connecticut 06510, United States
- Yale Center on Climate Change and Health, Yale School of Public Health, New Haven, Connecticut 06510, United States
| | - Yang Liu
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, Georgia 30322, United States
| |
Collapse
|
26
|
Muhsin ZJ, Qahwaji R, Ghafir I, AlShawabkeh M, Al Bdour M, AlRyalat SA, Al-Taee M. Highly efficient stacking ensemble learning model for automated keratoconus screening. EYE AND VISION (LONDON, ENGLAND) 2025; 12:25. [PMID: 40556022 PMCID: PMC12186405 DOI: 10.1186/s40662-025-00440-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2024] [Accepted: 05/21/2025] [Indexed: 06/28/2025]
Abstract
BACKGROUND Despite extensive research on keratoconus (KC) detection with traditional machine learning models, stacking ensemble learning approaches remain underexplored. This paper presents a stacking ensemble learning method to enhance automated KC screening. METHODS This study utilizes a clinical dataset containing detailed corneal data from 2491 cases classified as non-KC (NKC), subclinical KC (SCKC) and clinical KC (CKC). Each cornea is represented by 79 features extracted from Pentacam imaging. Following extensive pre-processing, key corneal features that are strongly correlated with the target diagnosis are identified. These features are the keratometry of the steepest anterior point, surface variance index, vertical asymmetry index, height decentration index, and height asymmetry index. A novel stacking ensemble model is developed using the selected features to improve corneal classification into NKC, SCKC, and CKC by integrating top tree-based classifiers (random forest, gradient boosting, decision trees) with a support vector machine meta-classifier. RESULTS The pre-processing and feature selection techniques reduced the model's parameters to just 6.33% of the original dataset, improving classification performance, and cutting over 85% of the training time. The performance of the developed model was validated and tested on unseen data. Experimental results showed that the model outperforms existing studies, achieving 99.72% accuracy, precision, sensitivity, F1, and F2 scores, with a Matthews correlation coefficient of 0.995. It accurately classified all NKC and CKC cases, with just one misclassification involving an SCKC case. The model also demonstrated consistent performance on 100 additional unseen test cases, underscoring its generalizability and robustness in KC screening. CONCLUSIONS By combining the strengths of diverse base models and key Pentacam indices, the stacking ensemble approach ensures reliable, accurate KC screening, providing clinicians with an automated tool for early detection and better patient management.
Collapse
Affiliation(s)
- Zahra J Muhsin
- Faculty of Engineering and Digital Technologies, University of Bradford, Bradford, BD7 1DP, UK
| | - Rami Qahwaji
- Faculty of Engineering and Digital Technologies, University of Bradford, Bradford, BD7 1DP, UK.
| | - Ibrahim Ghafir
- Faculty of Engineering and Digital Technologies, University of Bradford, Bradford, BD7 1DP, UK
| | | | | | | | - Majid Al-Taee
- Independent Consultant of Computing and Systems Engineering, Liverpool, UK
| |
Collapse
|
27
|
Aydemir M, Çakir M, Oral O, Yilmaz M. Diagnosis of Cushing's syndrome with generalized linear model and development of mobile application. Medicine (Baltimore) 2025; 104:e42910. [PMID: 40550094 PMCID: PMC12187266 DOI: 10.1097/md.0000000000042910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Accepted: 06/01/2025] [Indexed: 06/28/2025] Open
Abstract
BACKGROUND Cushing syndrome (CS) is a rare endocrine disorder characterized by excessive secretion of glucocorticoids, leading to a variety of clinical manifestations, comorbidities, and increased mortality despite treatment. Despite advances in imaging modalities and biochemical testing, the diagnosis and management of CS remains challenging. Several tests are used to confirm the diagnosis of CS, including urinary free cortisol measurements, dexamethasone suppression tests (1 mg, 2 mg, and 8 mg), and nocturnal salivary cortisol measurements. However, each of these tests has some limitations, making the diagnosis of CS. METHODS In this paper, we explore the potential of state-of-the-art machine learning algorithms as a clinical decision support system for analyzing and classifying CS. Our aim is to use advanced machine learning methods to analyze the accuracy rates of diagnostic tests and identify the most sensitive tests for diagnosing CS. RESULTS In this study, we performed binary classification based on data from 278 patients with CS (CS+) and 220 healthy patients (CS-). We developed a linear mathematical model with high predictive ability, achieving a classification accuracy of 97.03% and a Kappa value of 94.05%. The correlation graph shows that CS has strong positive relationships with 2 mg (78.8%), 1 mg (76.9%), and mc (72.1%), and moderate positive correlations with 8 mg (45%) and saliva (45.4%). In contrast, gender has almost no correlation with CS, so it was removed from the dataset. As a result, the model achieves an overall classification accuracy of 97.03%. Finally, we converted the linear model into a mobile application for use by specialist doctors in the field of endocrinology. CONCLUSION Traditional diagnostic methods can be time-consuming and require specialized medical expertise. Recently, advances in machine learning and mobile technology have opened new avenues for improving diagnostic accuracy and accessibility. This study explores the integration of machine learning algorithms into a mobile application designed to assist healthcare professionals and patients in the diagnosis of CS.
Collapse
Affiliation(s)
- Mustafa Aydemir
- Department of Internal Medicine, Division of Endocrinology and Metabolism, Akdeniz University School of Medicine Antalya, Antalya, Turkey
| | - Mustafa Çakir
- Iskenderun Technical University, Iskenderun Vocational School of Higher Education, Iskenderun, Hatay, Turkey
| | - Okan Oral
- Akdeniz University, Faculty of Engineering, Mechatronics Engineering Antalya, Turkey
| | - Mesut Yilmaz
- Department of Aquaculture, Faculty of Aquaculture, Akdeniz University, Antalya, Turkey
| |
Collapse
|
28
|
Adnan T, Abdelkader A, Liu Z, Hossain E, Park S, Islam MS, Hoque E. A novel fusion architecture for detecting Parkinson's Disease using semi-supervised speech embeddings. NPJ Parkinsons Dis 2025; 11:176. [PMID: 40541966 DOI: 10.1038/s41531-025-00956-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 04/08/2025] [Indexed: 06/22/2025] Open
Affiliation(s)
- Tariq Adnan
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | | | - Zipei Liu
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Ekram Hossain
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Sooyong Park
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Md Saiful Islam
- Department of Computer Science, University of Rochester, Rochester, NY, USA
| | - Ehsan Hoque
- Department of Computer Science, University of Rochester, Rochester, NY, USA.
- Ministry of Defense Health Services, Riyadh, Saudi Arabia.
| |
Collapse
|
29
|
Rees CA, Kisenge R, Godfrey E, Ideh RC, Kamara J, Coleman-Nekar YJG, Samma A, Manji HK, Sudfeld CR, Westbrook AL, Niescierenko M, Morris CR, Florin TA, Whitney CG, Manji KP, Duggan CP, Kamaleswaran R. Machine learning approaches to identify neonates and young children at risk for postdischarge mortality in Dar es Salaam, Tanzania and Monrovia, Liberia. BMJ Paediatr Open 2025; 9:e003547. [PMID: 40541283 PMCID: PMC12182169 DOI: 10.1136/bmjpo-2025-003547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/03/2025] [Accepted: 06/05/2025] [Indexed: 06/22/2025] Open
Abstract
BACKGROUND The time after hospital discharge carries high rates of mortality in neonates and young children in sub-Saharan Africa. Previous work using logistic regression to develop risk assessment tools to identify those at risk for postdischarge mortality has yielded fair discriminatory value. Our objective was to determine if machine learning models would have greater discriminatory value to identify neonates and young children at risk for postdischarge mortality. METHODS We conducted a planned secondary analysis of a prospective observational cohort at Muhimbili National Hospital in Dar es Salaam, Tanzania and John F. Kennedy Medical Center in Monrovia, Liberia. We enrolled neonates and young children near the time of discharge. The outcome was 60-day postdischarge mortality. We collected socioeconomic, demographic, clinical, and anthropometric data during hospital admission and used machine learning (ie, eXtreme Gradient Boosting (XGBoost), Hist-Gradient Boost, Support Vector Machine, Neural Network, and Random Forest) to develop risk assessment tools to identify: (1) neonates and (2) young children at risk for postdischarge mortality. RESULTS A total of 2310 neonates and 1933 young children enrolled. Of these, 71 (3.1%) neonates and 67 (3.5%) young children died after hospital discharge. XGBoost, Hist Gradient Boost, and Neural Network models yielded the greatest discriminatory value (area under the receiver operating characteristic curves range: 0.94-0.99) and fewest features, which included six features for neonates and five for young children. Discharge against medical advice, low birth weight, and supplemental oxygen requirement during hospitalisation were predictive of postdischarge mortality in neonates. For young children, discharge against medical advice, pallor, and chronic medical problems were predictive of postdischarge mortality. CONCLUSIONS Our parsimonious machine learning-based models had excellent discriminatory value to predict postdischarge mortality among neonates and young children. External validation of these tools is warranted to assist in the design of interventions to reduce postdischarge mortality in these vulnerable populations.
Collapse
Affiliation(s)
- Chris A Rees
- Division of Pediatric Emergency Medicine, Emory University School of Medicine, Atlanta, Georgia, USA
- Children's Healthcare of Atlanta, Atlanta, Georgia, USA
| | - Rodrick Kisenge
- Department of Pediatrics and Child Health, Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | - Evance Godfrey
- Muhimbili National Hospital, Dar es Salaam, United Republic of Tanzania
| | - Readon C Ideh
- Department of Pediatrics, John F. Kennedy Medical Center, Monrovia, Liberia
| | - Julia Kamara
- Department of Pediatrics, John F. Kennedy Medical Center, Monrovia, Liberia
| | | | - Abraham Samma
- Pediatrics and Child health, Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | - Hussein K Manji
- Department of Emergency Medicine, Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
- Accident and Emergency Department, Aga Khan Health Services, Dar es Salaam, Tanzania
| | - Christopher R Sudfeld
- Departments of Nutrition and Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Adrianna L Westbrook
- Pediatrics Biostatistics Core, Department of Pediatrics, Emory University, Atlanta, Georgia, USA
| | - Michelle Niescierenko
- Division of Emergency Medicine, Boston Children's Hospital, Boston, Massachusetts, USA
- Department of Pediatrics and Emergency Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Claudia R Morris
- Division of Pediatric Emergency Medicine, Emory University School of Medicine, Atlanta, Georgia, USA
- Children's Healthcare of Atlanta, Atlanta, Georgia, USA
| | - Todd A Florin
- Division of Emergency Medicine, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, Illinois, USA
| | | | - Karim P Manji
- Department of Pediatrics and Child Health, Muhimbili University of Health and Allied Sciences, Dar es Salaam, United Republic of Tanzania
| | - Christopher P Duggan
- Departments of Nutrition and Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Center for Nutrition, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Rishikesan Kamaleswaran
- Division of Translational Biomedical Informatics, Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
30
|
Zheng M, Zhang Y, Laws RA, Vuillermin P, Dodd J, Wen LM, Baur LA, Taylor R, Byrne R, Ponsonby AL, Hesketh KD. Development of Machine Learning-Based Risk Prediction Models to Predict Rapid Weight Gain in Infants: Analysis of Seven Cohorts. JMIR Public Health Surveill 2025; 11:e69220. [PMID: 40532141 PMCID: PMC12192193 DOI: 10.2196/69220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 03/18/2025] [Accepted: 03/25/2025] [Indexed: 06/29/2025] Open
Abstract
Background Rapid weight gain (RWG) during infancy, defined as an upward crossing of one centile line on a weight growth chart, is highly predictive of subsequent obesity risk. Identification of infant RWG could facilitate obesity risk assessment from infancy. Objective Leveraging machine learning (ML) algorithms, this study aimed to develop and validate risk prediction models to identify infant RWG by the age of 1 year. Methods Data from 7 Australian and New Zealand cohorts were pooled for risk model development and validation (n=5233). A total of 8 ML algorithms predicted infant RWG using routinely available prenatal and early postnatal factors, including maternal prepregnancy weight status, maternal smoking during pregnancy, gestational age, parity, infant sex, birth weight, any breastfeeding and timing of solids introduction at the age of 6 months. Pooled data were randomly split into a training dataset (70%) and a test dataset (30%) for model training and validation, respectively. Model consistency was evaluated using 5-fold cross-validation. Model predictive performance was evaluated by area under the receiver operating characteristic (ROC) curve (AUC), accuracy, precision, sensitivity, specificity, and Cohen κ. Results The average prevalence of infant RWG was 27%. In the training dataset, all ML algorithms showed acceptable to excellent discrimination with AUCs ranging from 0.75 to 0.86. Accuracy, which indicates the overall correctness of the model, ranged from 0.69 to 0.78. Precision, which measures the model's ability to avoid false positives, ranged from 0.68 to 0.77. The spread of sensitivity, specificity, and Cohen κ of all models was 0.68-0.80, 0.65-0.78, and 0.38-0.56, respectively. Of the 8 algorithms, the Gradient Boosting model showed the most favorable predictive accuracy. Validation of the Gradient Boosting model in the testing dataset exhibited excellent discrimination (AUC 0.3-0.6) and good ability to make accurate predictions, particularly true positive cases (with accuracy and sensitivity>0.75), but modest performance for precision (0.57-0.60) and Cohen κ (0.47-0.52). Conclusions This study developed the first set of ML-based risk prediction models to identify infants' risk of experiencing RWG by the age of 1 year with acceptable accuracy. The models could be feasibly integrated into routine child growth monitoring and may facilitate population-wide early obesity risk assessment in primary health care.
Collapse
Affiliation(s)
- Miaobing Zheng
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Geelong, Australia
- School of Health Sciences, Faculty of Health & Medicine, UNSW Sydney, Wallace Wurth Building, Kensington, 2330, Australia, 61 0290659337
| | - Yuxin Zhang
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Geelong, Australia
| | - Rachel A Laws
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Geelong, Australia
| | | | - Jodie Dodd
- Discipline of Obstetrics and Gynaecology, The Robinson Research Institute, The University of Adelaide, Adelaide, Australia
| | - Li Ming Wen
- School of Public Health and Sydney Medical School, The University of Sydney, Sydney, Australia
| | - Louise A Baur
- School of Public Health and Sydney Medical School, The University of Sydney, Sydney, Australia
| | - Rachael Taylor
- Department of Medicine, University of Otago, Dunedin, New Zealand
| | - Rebecca Byrne
- School of Exercise and Nutrition Sciences, Faculty of Health, Queensland University of Technology, Kelvin Grove, Australia
| | - Anne-Louise Ponsonby
- The Florey Institute of Neuroscience and Mental Health, Murdoch Children's Research Institute, Royal Children's Hospital, The University of Melbourne, Parkville, Australia
| | - Kylie D Hesketh
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Geelong, Australia
| |
Collapse
|
31
|
Matboli M, Khaled A, Ahmed MF, Ahmed MY, Khaled R, Elmakromy GM, Ghani AMA, El-Shafei MM, Abdelhalim MRM, Gwad AMAE. Machine learning-based stratification of prediabetes and type 2 diabetes progression. Diabetol Metab Syndr 2025; 17:227. [PMID: 40533788 PMCID: PMC12175357 DOI: 10.1186/s13098-025-01786-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 06/01/2025] [Indexed: 06/22/2025] Open
Abstract
BACKGROUND Diabetes mellitus, a global health concern with severe complications, demands early detection and precise staging for effective management. Machine learning approaches, combined with bioinformatics, offer promising avenues for enhancing diagnostic accuracy and identifying key biomarkers. METHODS This study employed a multi-class classification framework to classify patients across four health states: healthy, prediabetes, type 2 Diabetes Mellitus (T2DM) without complications, and T2DM with complications. Three models were developed using molecular markers, biochemical markers, and a combined model of both. Five machine learning classifiers were applied: Random Forest (RF), Extra Tree Classifier, Quadratic Discriminant Analysis, Naïve Bayes, and Light Gradient Boosting Machine. To improve the robustness and precision of the classification, Recursive Feature Elimination with Cross-Validation (RFECV) and a fivefold cross-validation were used. The multi-class classification approach enabled effective discrimination between the four diabetes stages. RESULTS The top contributing features identified for the combined model through RFECV included three molecular markers-miR342, NFKB1, and miR636-and two biochemical markers the albumin-to-creatinine ratio and HDLc, indicating their strong association with diabetes progression. The Extra Trees Classifier achieved the highest performance across all models, with an AUC value of 0.9985 (95% CI: [0.994-1.000]). This classifier outperformed other models, demonstrating its robustness and applicability for precise diabetes staging. CONCLUSION These findings underscore the value of integrating machine learning with molecular and biochemical markers for the accurate classification of diabetes stages, supporting a potential shift toward more personalized diabetes management.
Collapse
Affiliation(s)
- Marwa Matboli
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, 11566, Egypt.
| | - Abdelrahman Khaled
- Bioinformatics Group, Center of Informatics Sciences (CIS), School of Information Technology and Computer Sciences, Nile University, Giza, Egypt
| | - Manar Fouad Ahmed
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, 11566, Egypt
| | - Manar Yehia Ahmed
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, 11566, Egypt
| | - Radwa Khaled
- Biotechnology Department, Faculty of Science, Cairo University, Cairo, 11566, Egypt
| | - Gena M Elmakromy
- Endocrinology & Diabetes Mellitus Unit, Department of Internal Medicine, Badr University in Cairo, Badr, Egypt
| | | | - Marwa M El-Shafei
- Pathology Department, Faculty of Oral and Dental Medicine, Misr International University, Cairo, Egypt
| | | | - Asmaa Mohamed Abd El Gwad
- Department of Medical Biochemistry and Molecular Biology, Faculty of Medicine, Ain Shams University, Cairo, 11566, Egypt
| |
Collapse
|
32
|
Montoya AL, Hogendorf AS, Tingey S, Kuberan A, Yuen LH, Schüler H, Franzini RM. Widespread false negatives in DNA-encoded library data: how linker effects impair machine learning-based lead prediction. Chem Sci 2025; 16:10918-10927. [PMID: 40395382 PMCID: PMC12086585 DOI: 10.1039/d5sc00844a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2025] [Accepted: 05/07/2025] [Indexed: 06/22/2025] Open
Abstract
DNA-encoded chemical libraries (DECLs) have become integral to early-stage drug discovery, yielding active compounds and extensive labeled datasets for machine learning (ML)-based prediction of bioactive molecules. However, the information content of DECL selection data remains scarcely explored. This study systematically investigates for the first time the prevalence of false negatives and the influence of the linker in DECL data. Using a focused DECL targeting the poly-(ADP-ribose) polymerases PARP1/2 and TNKS1/2 as a model system, we found that our DECL selections frequently miss active compounds, with numerous false negatives for each identified hit. The presence of the DNA-conjugation linker emerged as a factor contributing to the underdetection of active molecules. This bias toward false negatives compromises the predictive power of DECL data for prioritizing hits, anticipating target selectivity, and training ML models, as determined by analyzing the effects of undersampling and oversampling techniques in learning the PARP2 data. Conversely, the linker's presence in DECLs offers advantages, such as enabling the identification of target-selective protein engagers, even when the underlying molecules themselves may not be selective. These findings highlight the challenges and opportunities of DECL data, emphasizing the need for best practices in data handling and ML model development in drug discovery.
Collapse
Affiliation(s)
- Alba L Montoya
- Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA
| | - Adam S Hogendorf
- Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA
| | | | | | - Lik Hang Yuen
- Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA
| | - Herwig Schüler
- Center for Molecular Protein Science, Department of Chemistry, Lund University Lund 22100 Sweden
| | - Raphael M Franzini
- Department of Medicinal Chemistry, College of Pharmacy, University of Utah 30 S 2000 E Salt Lake City UT 84112 USA
- Huntsman Cancer Institute, University of Utah 2000 Circle of Hope Salt Lake City UT 84054 USA
| |
Collapse
|
33
|
Rabipour M, Hassenrück F, Pallaske E, Röhrig F, Hallek M, Alvarez-Idaboy JR, Kramer O, Rebollido-Rios R. Allosteric Coupling in Full-Length Lyn Kinase Revealed by Molecular Dynamics and Network Analysis. Int J Mol Sci 2025; 26:5835. [PMID: 40565298 DOI: 10.3390/ijms26125835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2025] [Revised: 06/16/2025] [Accepted: 06/16/2025] [Indexed: 06/28/2025] Open
Abstract
Lyn is a multifunctional Src-family kinase (SFK) that regulates immune signaling and has been implicated in diverse types of cancer. Unlike other SFKs, its full-length structure and regulatory dynamics remain poorly characterized. In this study, we present the first long-timescale molecular dynamics analysis of full-length Lyn, including the SH3, SH2, and SH1 domains, across wildtype, ligand-bound, and cancer-associated mutant states. Using principal component analysis, dynamic cross-correlation matrices, and network-based methods, we show that ATP binding stabilizes the kinase core and promotes interdomain coordination, while the ATP-competitive inhibitor dasatinib and specific mutations (e.g., E290K, I364N) induce conformational decoupling and weaken long-range communication. We identify integration modules and develop an interface-weighted scoring scheme to rank dynamically central residues. This analysis reveals 44 allosteric hubs spanning SH3, SH2, SH1, and interdomain regions. Finally, a random forest classifier trained on 16 MD-derived features highlights key interdomain descriptors, distinguishing functional states with an AUC of 0.98. Our results offer a dynamic and network-level framework for understanding Lyn regulation and identify potential regulatory hotspots for structure-based drug design. More broadly, our approach demonstrates the value of integrating full-length MD simulations with network and machine learning techniques to probe allosteric control in multidomain kinases.
Collapse
Affiliation(s)
- Mina Rabipour
- Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Düsseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
- Center for Molecular Medicine Cologne, 50931 Cologne, Germany
- CECAD Center of Excellence on Cellular Stress Responses in Aging-Associated Diseases, 50931 Cologne, Germany
| | - Floyd Hassenrück
- Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Düsseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
- Center for Molecular Medicine Cologne, 50931 Cologne, Germany
- CECAD Center of Excellence on Cellular Stress Responses in Aging-Associated Diseases, 50931 Cologne, Germany
| | - Elena Pallaske
- Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Düsseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
- Center for Molecular Medicine Cologne, 50931 Cologne, Germany
- CECAD Center of Excellence on Cellular Stress Responses in Aging-Associated Diseases, 50931 Cologne, Germany
| | - Fernanda Röhrig
- Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Düsseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
- Center for Molecular Medicine Cologne, 50931 Cologne, Germany
- CECAD Center of Excellence on Cellular Stress Responses in Aging-Associated Diseases, 50931 Cologne, Germany
| | - Michael Hallek
- Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Düsseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
- Center for Molecular Medicine Cologne, 50931 Cologne, Germany
- CECAD Center of Excellence on Cellular Stress Responses in Aging-Associated Diseases, 50931 Cologne, Germany
| | - Juan Raul Alvarez-Idaboy
- Facultad de Química, Departamento de Física y Química Teórica, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico
| | - Oliver Kramer
- Computational Intelligence Lab, Department of Computer Science, University of Oldenburg, 26129 Oldenburg, Germany
| | - Rocio Rebollido-Rios
- Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Düsseldorf, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
- Center for Molecular Medicine Cologne, 50931 Cologne, Germany
- CECAD Center of Excellence on Cellular Stress Responses in Aging-Associated Diseases, 50931 Cologne, Germany
| |
Collapse
|
34
|
Lin J, Ma Q, Chen L, Guo W, Feng K, Huang T, Cai YD. Transcriptomic and miRNA Signatures of ChAdOx1 nCoV-19 Vaccine Response Using Machine Learning. Life (Basel) 2025; 15:981. [PMID: 40566633 DOI: 10.3390/life15060981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2025] [Revised: 06/15/2025] [Accepted: 06/16/2025] [Indexed: 06/28/2025] Open
Abstract
Vaccination with ChAdOx1 nCoV-19 is an important countermeasure to fight the COVID-19 pandemic. This vaccine enhances human immunoprotection against SARS-CoV-2 by inducing an immune response against the SARS-CoV-2 S protein. However, the immune-related genes induced by vaccination remain to be identified. This study employs feature ranking algorithms, an incremental feature selection method, and classification algorithms to analyze transcriptomic data from an experimental group vaccinated with the ChAdOx1 nCoV-19 vaccine and a control group vaccinated with the MenACWY meningococcal vaccine. According to different time points, vaccination status, and SARS-CoV-2 infection status, the transcriptomic data was divided into five groups, including a pre-vaccination group, ChAdOx1-onset group, MenACWY-onset group, ChAdOx1-7D group, and MenACWY-7D group. Each group contained samples with 13,383 RNA features and 1662 small RNA features. The results identified key genes that could indicate the efficacy of the ChAdOx1 nCoV-19 vaccine, and a classifier was developed to classify samples into the above groups. Additionally, effective classification rules were established to distinguish between different vaccination statuses. It was found that subjects vaccinated with ChAdOx1 nCoV-19 vaccine and infected with SARS-CoV-2 were characterized by up-regulation of HIST1H3G expression and down-regulation of CASP10 expression. In addition, IGHG1, FOXM1, and CASP10 genes were strongly associated with ChAdOx1 nCoV-19 vaccine efficacy. Compared with previous omics-driven studies, the machine learning algorithms used in this study were able to analyze transcriptome data faster and more comprehensively to identify potential markers associated with vaccine effect and investigate ChAdOx1 nCoV-19 vaccine-induced gene expression changes. These observations contribute to an understanding of the immune protection and inflammatory responses induced by the ChAdOx1 nCoV-19 vaccine during symptomatic episodes and provide a rationale for improving vaccine efficacy.
Collapse
Affiliation(s)
- Jinting Lin
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Qinglan Ma
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Wei Guo
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Kaiyan Feng
- Department of Computer Science, Guangdong AIB Polytechnic College, Guangzhou 510507, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- Department of Artificial Intelligence and Digital Health, CAS Engineering Laboratory for Nutrition, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
35
|
Deatsch A, McKenna M, Palumbo J, Tian Q, Simonsick E, Ferrucci L, Jeraj R, Spencer RG. Prediction of future aging-related slow gait and its determinants with deep learning and logistic regression. PLoS One 2025; 20:e0325172. [PMID: 40526703 DOI: 10.1371/journal.pone.0325172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Accepted: 05/08/2025] [Indexed: 06/19/2025] Open
Abstract
BACKGROUND Identification of accelerated aging and its biomarkers can lead to more timely therapeutic interventions and decision-making. Therefore, we sought to predict aging-related slow gait, a known predictor of accelerated aging, and its determinants. METHODS We applied a deep learning neural network (NN) and compared it to conventional logistic regression (LR) analysis. We incorporated 1,363 participants from the Baltimore Longitudinal Study of Aging to predict current and future slow gait at 6-year and 10-year follow-up using two clinically-relevant cut-points. RESULTS Our NN achieved a maximum sensitivity (specificity) of 81.2% (87.9%), for a 10-year prediction with 0.8 m/s cut-point. We demonstrated the necessity of class balancing and found the NN to perform comparably to or in some cases, better than, LR which achieved a maximum sensitivity and specificity of 84.5% and 86.3%, respectively. Sobol index analysis identified the strongest determinants to be age, BMI, sleep, and grip strength. CONCLUSIONS The novel use of a NN for this purpose, and successful benchmarking against conventional techniques, justifies further exploration and expansion of this model.
Collapse
Affiliation(s)
- Alison Deatsch
- Department of Medical Physics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Michael McKenna
- Magnetic Resonance Imaging and Spectroscopy Section, Laboratory of Clinical Investigation, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Jonathan Palumbo
- Magnetic Resonance Imaging and Spectroscopy Section, Laboratory of Clinical Investigation, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Qu Tian
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Eleanor Simonsick
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| | - Robert Jeraj
- Department of Medical Physics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia
| | - Richard G Spencer
- Magnetic Resonance Imaging and Spectroscopy Section, Laboratory of Clinical Investigation, National Institute on Aging, National Institutes of Health, Baltimore, Maryland, United States of America
| |
Collapse
|
36
|
Patel JS, Karanth D. Building and Evaluating an Orthodontic Natural Language Processing Model for Automated Clinical Note Information Extraction. Orthod Craniofac Res 2025. [PMID: 40515549 DOI: 10.1111/ocr.12944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Revised: 04/13/2025] [Accepted: 05/14/2025] [Indexed: 06/16/2025]
Abstract
INTRODUCTION Malocclusion presents functional and aesthetic challenges, necessitating accurate diagnosis and treatment. However, variability in orthodontic treatment planning persists due to subjective assessments, limiting consistency and objectivity. Electronic dental records (EDRs) contain vast patient data that could address these challenges, but much of the rich clinical information is documented as free text, complicating analysis. This study aims to develop an Orthodontic Natural Language Processing (ONLP) model to extract structured orthodontics-related information from unstructured EDRs and identify critical features influencing malocclusion using machine learning (ML). METHODS Data from 7693 orthodontic patients were analysed to train, test and validate the ONLP and ML models. A gold-standard dataset was created through manual review. The ONLP model utilised supervised (Named Entity Recognition-NER) and unsupervised (K-means clustering) approaches to structure information from free text. Machine learning models, including Logistic Regression, Gaussian Naive Bayes, Random Forest and XGBoost, were subsequently applied to identify feature importance for malocclusion classification. RESULTS The ONLP model achieved 89% sensitivity, 92% specificity and 91% accuracy in extracting orthodontics-related information. The supervised model demonstrated 84% accuracy, 82% F1-score and 84% recall, excelling in identifying Classes I and III malocclusions but showing reduced sensitivity for Class II. Machine learning analysis highlighted key features for malocclusion classification: maxillary crowding, overjet and arch perimeter discrepancy for Class I; maxillary spacing and anterior crossbite for Class II; and dental midline deviation and occlusal wear for Class III. CONCLUSION This study demonstrates a novel approach to automating orthodontic data extraction using the ONLP model, enabling advanced big data analytics and enhancing data-driven orthodontic research and care.
Collapse
Affiliation(s)
- Jay S Patel
- Center for Dental Informatics and Artificial Intelligence, Department of Oral Health Sciences, Temple University Kornberg School of Dentistry, Philadelphia, Pennsylvania, USA
| | - Divakar Karanth
- Department of Orthodontics, University of Florida College of Dentistry, Gainesville, Florida, USA
| |
Collapse
|
37
|
Chen Y, Helis C, Cramer C, Munley M, Choi AR, Tan J, Xing F, Lyu Q, Whitlow C, Willey J, Chan M, Jiang Y. MRI-Based Radiomics Ensemble Model for Predicting Radiation Necrosis in Brain Metastasis Patients Treated with Stereotactic Radiosurgery and Immunotherapy. Cancers (Basel) 2025; 17:1974. [PMID: 40563624 DOI: 10.3390/cancers17121974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2025] [Revised: 06/02/2025] [Accepted: 06/09/2025] [Indexed: 06/28/2025] Open
Abstract
Background: Radiation therapy is a primary and cornerstone treatment modality for brain metastasis. However, it can result in complications like necrosis, which may lead to significant neurological deficits. This study aims to develop and validate an ensemble model with radiomics to predict radiation necrosis. Method: This study retrospectively collected and analyzed MRI images and clinical information from 209 stereotactic radiosurgery sessions involving 130 patients with brain metastasis. An ensemble model integrating gradient boosting, random forest, decision tree, and support vector machine was developed and validated using selected radiomic features and clinical factors to predict the likelihood of necrosis. The model performance was evaluated and compared with other machine learning algorithms using metrics, including the area under the curve (AUC), sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). SHapley Additive exPlanations (SHAP) analysis and local interpretable model-agnostic explanations (LIME) analysis were applied to explain the model's prediction. Results: The ensemble model achieved strong performance in the validation cohort, with the highest AUC. Compared to individual models and the stacking ensemble model, it consistently outperformed. The model demonstrated superior accuracy, generalizability, and reliability in predicting radiation necrosis. SHAP and LIME were used to interpret a complex predictive model for radiation necrosis. Both analyses highlighted similar significant factors, enhancing our understanding of prediction dynamics. Conclusions: The ensemble model using radiomic features exhibited high accuracy and robustness in predicting the occurrence of radiation necrosis. It could serve as a novel and valuable tool to facilitate radiotherapy for patients with brain metastasis.
Collapse
Affiliation(s)
- Yijun Chen
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Corbin Helis
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Christina Cramer
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Michael Munley
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Ariel Raimundo Choi
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Josh Tan
- Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Fei Xing
- Department of Cancer Biology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Qing Lyu
- Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Christopher Whitlow
- Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Jeffrey Willey
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Michael Chan
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| | - Yuming Jiang
- Department of Radiation Oncology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA
| |
Collapse
|
38
|
Yilmaz Başer H, Evran T, Cifci MA. Machine Learning-Augmented Triage for Sepsis: Real-Time ICU Mortality Prediction Using SHAP-Explained Meta-Ensemble Models. Biomedicines 2025; 13:1449. [PMID: 40564166 DOI: 10.3390/biomedicines13061449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2025] [Revised: 05/13/2025] [Accepted: 05/26/2025] [Indexed: 06/28/2025] Open
Abstract
Background/Objectives: Optimization algorithms are acknowledged to be critical in various fields and dynamical systems since they provide facilitation in identifying and retrieving the most possible solutions concerning complex problems besides improving efficiency, cutting down on costs, and boosting performance. Metaheuristic optimization algorithms, on the other hand, are inspired by natural phenomena, providing significant benefits related to the applicable solutions for complex optimization problems. Considering that complex optimization problems emerge across various disciplines, their successful applications are possible to be observed in tasks of classification and feature selection tasks, including diagnostic processes of certain health problems based on bio-inspiration. Sepsis continues to pose a significant threat to patient survival, particularly among individuals admitted to intensive care units from emergency departments. Traditional scoring systems, including qSOFA, SIRS, and NEWS, often fall short of delivering the precision necessary for timely and effective clinical decision-making. Methods: In this study, we introduce a novel, interpretable machine learning framework designed to predict in-hospital mortality in sepsis patients upon intensive care unit admission. Utilizing a retrospective dataset from a tertiary university hospital encompassing patient records from January 2019 to June 2024, we extracted comprehensive clinical and laboratory features. To address class imbalance and missing data, we employed the Synthetic Minority Oversampling Technique and systematic imputation methods, respectively. Our hybrid modeling approach integrates ensemble-based ML algorithms with deep learning architectures, optimized through the Red Piranha Optimization algorithm for feature selection and hyperparameter tuning. The proposed model was validated through internal cross-validation and external testing on the MIMIC-III dataset as well. Results: The proposed model demonstrates superior predictive performance over conventional scoring systems, achieving an area under the receiver operating characteristic curve of 0.96, a Brier score of 0.118, and a recall of 81. Conclusions: These results underscore the potential of AI-driven tools to enhance clinical decision-making processes in sepsis management, enabling early interventions and potentially reducing mortality rates.
Collapse
Affiliation(s)
- Hülya Yilmaz Başer
- Department of Emergency Medicine, Faculty of Medicine, Bandirma Onyedi Eylul University, 10250 Balıkesir, Türkiye
| | - Turan Evran
- Department of Anesthesia and Reanimation, Faculty of Medicine, Pamukkale University, 20070 Denizli, Türkiye
| | - Mehmet Akif Cifci
- The Institute of Computer Technology, Tu Wien University, 1040 Vienna, Austria
- Engineering and Informatics Department, Klaipėdos Valstybinė Kolegija/Higher Education Institution, 92294 Klaipeda, Lithuania
| |
Collapse
|
39
|
Ryan L, Agaian S. Breast Cancer Detection Using Infrared Thermography: A Survey of Texture Analysis and Machine Learning Approaches. Bioengineering (Basel) 2025; 12:639. [PMID: 40564455 DOI: 10.3390/bioengineering12060639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2025] [Revised: 05/25/2025] [Accepted: 06/06/2025] [Indexed: 06/28/2025] Open
Abstract
Breast cancer remains a leading cause of cancer-related deaths among women worldwide, highlighting the urgent need for early detection. While mammography is the gold standard, it faces cost and accessibility barriers in resource-limited areas. Infrared thermography is a promising cost-effective, non-invasive, painless, and radiation-free alternative that detects tumors by measuring their thermal signatures through thermal infrared radiation. However, challenges persist, including limited clinical validation, lack of Food and Drug Administration (FDA) approval as a primary screening tool, physiological variations among individuals, differing interpretation standards, and a shortage of specialized radiologists. This survey uniquely focuses on integrating texture analysis and machine learning within infrared thermography for breast cancer detection, addressing the existing literature gaps, and noting that this approach achieves high-ranking results. It comprehensively reviews the entire processing pipeline, from image preprocessing and feature extraction to classification and performance assessment. The survey critically analyzes the current limitations, including over-reliance on limited datasets like DMR-IR. By exploring recent advancements, this work aims to reduce radiologists' workload, enhance diagnostic accuracy, and identify key future research directions in this evolving field.
Collapse
Affiliation(s)
- Larry Ryan
- Department of Computer Science, Graduate Center, CUNY, City University of New York, New York, NY 10016, USA
| | - Sos Agaian
- Department of Computer Science, Graduate Center, CUNY, City University of New York, New York, NY 10016, USA
| |
Collapse
|
40
|
Hao Y, Duan Z, Liu L, Xue Q, Pan W, Liu X, Zhang A, Fu J. Development of an Interpretable Machine Learning Model for Neurotoxicity Prediction of Environmentally Related Compounds. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2025; 59:11108-11120. [PMID: 40307185 DOI: 10.1021/acs.est.5c03311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2025]
Abstract
The rising prevalence of nervous system disorders has become a significant global health challenge, with environmental pollutants identified as key contributors. However, the large number of environmental related compounds, combined with the low efficiency of traditional methods, has resulted in substantial gaps in neurotoxicity data. In this study, we developed a robust and interpretable neurotoxicity prediction model using a high-quality data set. To identify the best predictive model, three molecular representation methods (molecular fingerprints, molecular descriptors, and molecular graphs) combined with six traditional machine learning (ML) algorithms and two deep learning (DL) approaches were evaluated. The optimal model, combining molecular fingerprints and descriptors with eXtreme Gradient Boosting (XGBoost), achieved a training accuracy of 0.93 and an area under the curve (AUC) of 0.99, outperforming other ML and DL models, while maintaining interpretability. The model was used to screen 1170 compounds detected in human blood, predicting 1145 successfully. Among 89 compounds with known neurotoxicity data, the model achieved an accuracy of 0.74. It identified 821 potentially neurotoxic compounds, including 36 with high detection concentrations, warranting further study. An online platform (http://www.envwind.site/tools.html) was developed to expand accessibility. This model offers an efficient tool for predicting neurotoxicity and managing environmental health risks.
Collapse
Affiliation(s)
- Yuxing Hao
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310012, P. R. China
| | - Zhihui Duan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lizheng Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310012, P. R. China
| | - Qiao Xue
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Wenxiao Pan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310012, P. R. China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310012, P. R. China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
41
|
Park J, Baik JH, Adjei-Nimoh S, Lee WH. Advancements in artificial intelligence-based technologies for PFAS detection, monitoring, and management. THE SCIENCE OF THE TOTAL ENVIRONMENT 2025; 980:179536. [PMID: 40311342 DOI: 10.1016/j.scitotenv.2025.179536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 03/09/2025] [Accepted: 04/23/2025] [Indexed: 05/03/2025]
Abstract
Per- and polyfluoroalkyl substances (PFAS) are persistent environmental contaminants with strong carbon‑fluorine (CF) bonds that contribute to bioaccumulation and long-term environmental and health risks. Traditional PFAS detection and treatment methods are often time-consuming, costly, and limited in scope. Recently, artificial intelligence (AI)-based technologies, particularly machine learning (ML), have emerged as powerful tools for enhancing PFAS monitoring, source identification, and remediation. ML models such as random forest (RF), gradient boosting decision trees (GBDT), support vector machines (SVM), and artificial neural networks (ANN) have been successfully applied to classify PFAS contamination sources with over 96 % accuracy, predict PFAS concentrations in groundwater with an AUC of 0.90, and optimize removal processes such as nanofiltration and adsorption with R2 values exceeding 0.93. Despite these advancement, challenges remain in ensuring high-quality datasets, addressing data imbalance and improving model interpretability. Future research should focus on expanding public datasets, leveraging Automated ML (AutoML) for optimization, and integrating Al-driven sensors for real-time detection. AI-based approaches present a transformative opportunity to enhance efficiency, accuracy, and cost-effectiveness in PFAS management, aiding regulatory decision-making and environmental protection.
Collapse
Affiliation(s)
- Jungsu Park
- Department of Civil and Environmental Engineering, Hanbat National University,125, Dongseo-daero, Yuseong-gu, Daejeon 34158, Republic of Korea
| | - Jong-Hyun Baik
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, USA
| | - Samuel Adjei-Nimoh
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, USA
| | - Woo Hyoung Lee
- Department of Civil, Environmental and Construction Engineering, University of Central Florida, 12800 Pegasus Dr., Orlando, FL 32816, USA.
| |
Collapse
|
42
|
Lencastre P, Mathema R, Lind PG. From eyes' microtremors to critical flicker fusion. PLoS One 2025; 20:e0325391. [PMID: 40489451 DOI: 10.1371/journal.pone.0325391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Accepted: 05/12/2025] [Indexed: 06/11/2025] Open
Abstract
The critical flicker fusion threshold (CFFT) is the frequency at which a flickering light source becomes indistinguishable from continuous light. The CFFT is an important biomarker of health conditions, such as Alzheimer's disease and epilepsy, and is affected by factors as diverse as fatigue, drug consumption, and oxygen pressure, which make CFFT individual- and context-specific. Other causal factors beyond such biophysical processes are still to be uncovered. We investigate the connection between CFFT and specific eye-movements, called microtremors, which are small oscillatory gaze movements during fixation periods. We present evidence that individual differences in CFFT can be accounted by microtremors, and design an experiment, using a high-frequency monitor and recording the participant's eye-movements with an eye-tracker device, which enables to measure the range of frequencies of a specific individual's CFFT. Additionally, we introduce a classifier that can predict if the CFFT of specific participant lies in the range of high or low frequencies, based on the corresponding range of frequencies of eyes' microtremors. Our results show an accuracy of [Formula: see text] for a frequency threshold of 60 Hz and [Formula: see text] for a threshold of 120 Hz.
Collapse
Affiliation(s)
- Pedro Lencastre
- Department of Computer Science, OsloMet - Oslo Metropolitan University, Oslo, Norway
- OsloMet Artificial Intelligence Lab, OsloMet, Oslo, Norway
| | - Rujeena Mathema
- Department of Computer Science, OsloMet - Oslo Metropolitan University, Oslo, Norway
- OsloMet Artificial Intelligence Lab, OsloMet, Oslo, Norway
| | - Pedro G Lind
- Department of Computer Science, OsloMet - Oslo Metropolitan University, Oslo, Norway
- OsloMet Artificial Intelligence Lab, OsloMet, Oslo, Norway
- Kristiania University of Applied Sciences, Oslo, Norway
- Simula Research Laboratory, Numerical Analysis and Scientific Computing, Oslo, Norway
| |
Collapse
|
43
|
Jiang B, Cen J, Zhu E, Wang J. Software technical debt prediction based on complex software networks. PLoS One 2025; 20:e0323672. [PMID: 40489424 DOI: 10.1371/journal.pone.0323672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 04/09/2025] [Indexed: 06/11/2025] Open
Abstract
Technical debt prediction (TDP) is crucial for the long-term maintainability of software. In the literature, many machine-learning based TDP models have been proposed; they used TD-related metrics as input features for machine-learning classifiers to build TDP models. However, their performance is unsatisfactory. Developing and utilizing more effective metrics to build TDP models is considered as a promising approach to enhance the performance of TDP models. Social Network Analysis (SNA) uses a set of metrics (i.e., SNA metrics) to characterize software elements (classes, binaries, etc.) in software from the perspective of software as a whole. SNA metrics are regarded as a compensation of TD-related metrics used in the existing TDP work, and thus are expected to improve the performance of existing TDP models. However, the effectiveness of SNA metrics in the field of TDP has never been explored so far. To fill this gap, in this paper, we propose an improved software technical debt prediction approach. First, we represent software as a Class Dependency Network, based on which we compute the value of a set of SNA metrics. Second, we combine SNA metrics with the TD-related metrics to create a combined metric suite (CMS). Third, we employ CMS as the input features and utilize seven commonly used machine learning classifiers to build TDP models. Empirical results on a publicly available data set show that (i) the combined metric suite (i.e., CMS) can indeed improve the performance of existing TDP models; (ii) XGBoost performs best among the seven classifiers, with an [Formula: see text] value of 0.77, an MI ratio of approximately 0.10, and a recall close to 0.87. Furthermore, we also reveal the relative effectiveness of different metric combinations.
Collapse
Affiliation(s)
- Bo Jiang
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, Zhejiang, China
| | - Jiaye Cen
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, Zhejiang, China
| | - Erluan Zhu
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, Zhejiang, China
| | - Jiale Wang
- School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, Zhejiang, China
| |
Collapse
|
44
|
Omodunbi BA, Olawade DB, Awe OF, Soladoye AA, Aderinto N, Ovsepian SV, Boussios S. Stacked Ensemble Learning for Classification of Parkinson's Disease Using Telemonitoring Vocal Features. Diagnostics (Basel) 2025; 15:1467. [PMID: 40564788 DOI: 10.3390/diagnostics15121467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2025] [Revised: 05/28/2025] [Accepted: 06/05/2025] [Indexed: 06/28/2025] Open
Abstract
Background: Parkinson's disease (PD) is a progressive neurodegenerative condition that impairs motor and non-motor functions. Early and accurate diagnosis is critical for effective management and care. Leveraging machine learning (ML) techniques, this study aimed to develop a robust prediction system for PD using a stacked ensemble learning approach, addressing challenges such as imbalanced datasets and feature optimization. Methods: An open-access PD dataset comprising 22 vocal attributes and 195 instances from 31 subjects was utilized. To prevent data leakage, subjects were divided into training (22 subjects) and testing (9 subjects) groups, ensuring no subject appeared in both sets. Preprocessing included data cleaning and normalization via min-max scaling. The synthetic minority oversampling technique (SMOTE) was applied exclusively to the training set to address class imbalance. Feature selection techniques-forward search, gain ratio, and Kruskal-Wallis test-were employed using subject-wise cross-validation to identify significant attributes. The developed system combined support vector machine (SVM), random forest (RF), K-nearest neighbor (KNN), and decision tree (DT) as base classifiers, with logistic regression (LR) as the meta-classifier in a stacked ensemble learning framework. Performance was evaluated using both recording-wise and subject-wise metrics to ensure clinical relevance. Results: The stacked ensemble learning model achieved realistic performance with a recording-wise accuracy of 84.7% and subject-wise accuracy of 77.8% on completely unseen subjects, outperforming individual classifiers including KNN (81.4%), RF (79.7%), and SVM (76.3%). Cross-validation within the training set showed 89.2% accuracy, with the performance difference highlighting the importance of proper validation methodology. Feature selection results showed that using the top 10 features ranked by gain ratio provided optimal balance between performance and clinical interpretability. The system's methodological robustness was validated through rigorous subject-wise evaluation, demonstrating the critical impact of validation methodology on reported performance. Conclusions: By implementing subject-wise validation and preventing data leakage, this study demonstrates that proper validation yields substantially different (and more realistic) results compared to flawed recording-wise approaches. The findings underscore the critical importance of validation methodology in healthcare ML applications and provide a template for methodologically sound PD classification research. Future research should focus on validating the model with larger, multi-center datasets and implementing standardized validation protocols to enhance clinical applicability.
Collapse
Affiliation(s)
- Bolaji A Omodunbi
- Department of Computer Engineering, Federal University Oye-Ekiti, Oye-Ekiti 371104, Nigeria
| | - David B Olawade
- Department of Allied and Public Health, School of Health, Sport and Bioscience, University of East London, London E16 2RD, UK
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, UK
- Department of Public Health, York St John University, York YO31 7EX, UK
- School of Health and Care Management, Arden University, Arden House, Middlemarch Park, Coventry CV3 4FJ, UK
| | - Omosigho F Awe
- Department of Computer Engineering, Federal University of Technology Akure, Gaga 340110, Nigeria
| | - Afeez A Soladoye
- Department of Computer Engineering, Federal University Oye-Ekiti, Oye-Ekiti 371104, Nigeria
| | - Nicholas Aderinto
- Department of Medicine and Surgery, Ladoke Akintola University of Technology, Ogbomoso 210214, Nigeria
| | - Saak V Ovsepian
- Faculty of Engineering and Science, University of Greenwich London, Chatham ME4 4TB, UK
- Faculty of Medicine, Tbilisi State University, Tbilisi 0177, Georgia
| | - Stergios Boussios
- Department of Research and Innovation, Medway NHS Foundation Trust, Gillingham ME7 5NY, UK
- Faculty of Medicine, Health, and Social Care, Canterbury Christ Church University, Canterbury CT2 7PB, UK
- Faculty of Life Sciences & Medicine, School of Cancer & Pharmaceutical Sciences, King's College London, Strand, London WC2R 2LS, UK
- Kent Medway Medical School, University of Kent, Canterbury CT2 7LX, UK
- AELIA Organization, 9th Km Thessaloniki-Thermi, 57001 Thessaloniki, Greece
- Department of Medical Oncology, Medway NHS Foundation Trust, Gillingham ME7 5NY, UK
- Faculty of Medicine, School of Health Sciences, University of Ioannina, 45110 Ioannina, Greece
- Department of Medical Oncology, Ioannina University Hospital, 45500 Ioannina, Greece
| |
Collapse
|
45
|
Peng D, Yu Z, Zhao S, Luo J, Shen L, Fang WH. Machine Learning Prediction on Birefringence of Nonlinear Optical Crystals and Polymorphs with Different Birefringence Activities. J Phys Chem Lett 2025:6087-6097. [PMID: 40490379 DOI: 10.1021/acs.jpclett.5c00980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2025]
Abstract
Nonlinear optical (NLO) crystal materials have been widely used in the scientific and industrial fields. Birefringence is an important property of the NLO crystals. Tuning appropriate birefringence through element substitution or polymorphic transformation may promote phase-matching performance facing various demands of laser wavelength. A growing number of studies based on machine learning (ML), such as the multilevel descriptors developed in our group (Zhang et al. J. Phys. Chem. C 2021, 125, 25175-25188), can successfully predict birefringence of NLO materials. However, how to identify polymorphs with different birefringence activities is still a nascent research topic. In this work, we proposed hp-wACSFs, a new descriptor based on the widely used atom-centered symmetric function, to predict the birefringence of inorganic crystals. A series of ML classifiers were built using hp-wACSFs. Two learning tasks, which aim at birefringence-active NLO crystals or polymorphs with different birefringence activities, were implemented. The performance on the former task was as good as our previously reported work, while the best accuracy on the latter task, which cannot be processed in the absence of three-dimensional descriptors, achieved 0.8 in this work. We finally implemented virtual screening using constructed ML models to search polymorphs with different birefringence activities.
Collapse
Affiliation(s)
- Ding Peng
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, China
| | - Zhaoxi Yu
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, China
| | - Sangen Zhao
- State Key Laboratory of Structural Chemistry, Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou, Fujian 350002, China
| | - Junhua Luo
- State Key Laboratory of Structural Chemistry, Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou, Fujian 350002, China
| | - Lin Shen
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, China
- Yantai-Jingshi Institute of Material Genome Engineering, Yantai, Shandong 265505, China
| | - Wei-Hai Fang
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
46
|
Fiandrino S, Donà D, Giaquinto C, Poletti P, Tira MD, Di Chiara C, Paolotti D. Clinical characteristics of COVID-19 in children and adolescents: insights from an Italian paediatric cohort using a machine-learning approach. BMJ PUBLIC HEALTH 2025; 3:e001888. [PMID: 40521332 PMCID: PMC12164349 DOI: 10.1136/bmjph-2024-001888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 04/30/2025] [Indexed: 06/18/2025]
Abstract
Introduction The epidemiology and clinical characteristics of COVID-19 evolved due to new SARS-CoV-2 variants of concern (VOCs). The Omicron VOC's higher transmissibility increased paediatric COVID-19 cases and hospital admissions. Most research during the Omicron period has focused on hospitalised cases, leaving a gap in understanding the disease's evolution in community settings. This study targets children with mild to moderate COVID-19 during pre-Omicron and Omicron periods. It aims to identify patterns in COVID-19 morbidity by clustering individuals based on symptom similarities and duration of symptoms and develop a machine-learning tool to classify new cases into risk groups. Methods We propose a data-driven approach to explore changes in COVID-19 characteristics by analysing data from 581 children and adolescents collected within a paediatric cohort at the University Hospital of Padua. First, we apply an unsupervised machine-learning algorithm to cluster individuals into groups. Second, we classify new patient risk groups using a random forest classifier model based on sociodemographic information, pre-existing medical conditions, vaccination status and the VOC as predictive variables. Third, we explore the key features influencing the classification through the SHapley Additive exPlanations. Results The unsupervised clustering identified three severity risk profile groups. Cluster 0 (mildest) had an average of 1.2 symptoms (95% CI 0.0 to 5.0) and mean symptom duration of 1.26 days (95%CI 0.0 to 9.0), cluster 1 had 2.27 symptoms (95% CI 1.0 to 6.0) lasting 3.47 days (95% CI 1.0 to 12.0), while cluster 2 (strongest symptom expression) exhibited 3.41 symptoms (95% CI 2.0 to 7.0) over 5.52 days (95% CI 0.0 to 16.0). Feature importance analysis showed that age was the most important predictor, followed by the variant of infection, influenza vaccination and the presence of comorbidities. The analysis revealed that younger children, unvaccinated individuals, those infected with Omicron and those with comorbidities were at higher risk of experiencing a greater number and longer duration of symptoms. Conclusions Our classification model has the potential to provide clinicians with insights into the children's risk profile of COVID-19 using readily available data. This approach can support public health by clarifying disease burden and improving patient care strategies. Furthermore, it underscores the importance of integrating risk classification models to monitor and manage infectious diseases.
Collapse
Affiliation(s)
- Stefania Fiandrino
- University of Rome La Sapienza, Rome, Italy
- ISI Foundation, Torino, Italy
| | - Daniele Donà
- Department of Women’s and Children’s Health, Università degli Studi di Padova, Padua, Italy
- Penta Foundation, Padua, Italy
| | - Carlo Giaquinto
- Department of Women’s and Children’s Health, Università degli Studi di Padova, Padua, Italy
- Penta Foundation, Padua, Italy
| | | | | | - Costanza Di Chiara
- Department of Women’s and Children’s Health, Università degli Studi di Padova, Padua, Italy
- Penta Foundation, Padua, Italy
| | | |
Collapse
|
47
|
Zhang J, Du T, Jin Y, Bao Y, Ma Q, Cai YD, Zhang J. Machine Learning Identifies Key Gene Markers Related to Fetal Retina Development at Single-Cell Transcription Level. Invest Ophthalmol Vis Sci 2025; 66:60. [PMID: 40531615 DOI: 10.1167/iovs.66.6.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2025] Open
Abstract
Purpose The retina is part of the central nervous system, and its function is vision. During the embryonic development of the fetus, it diversifies the seven major types of cells from a retinal progenitor cell. The purpose of this study is to investigate essential features that are necessary for the development of the fetal retina. Methods We generated a comprehensive single-cell transcriptional atlas of the human retina by leveraging datasets from the Chan Zuckerberg Initiative Single-Cell Biology collection of human fetal retinas. The eight critical types of retinal cells were investigated, including amacrine, bipolar, cone, horizontal, Müller glia, retinal ganglion, rod, and retinal progenitor cells. We evaluated a total of 36,503 gene features across three developmental stages (early, middle, and late) for each cell type. Using seven feature ranking algorithms and incremental feature selection method, we identified key gene features, constructed efficient classifiers and classification rules. Results For amacrine cells, RELN and DAB1 are critical; for bipolar cells, ANK3 and RIMS2; for cone cells, PDE6H and FRMPD2; for horizontal cells, NFIA; for Müller glial cells, WIF1 and TF; for retinal ganglion cells, IL1RAPL2 and PCP4; for rod cells, RIMS2 and NRG1; and for retinal progenitor cells, BTG1. These gene features bring into focus the pattern of gene regulation and developmental pathways of the retinal cells for deeper insights into retinal development. Conclusions This study explored molecular features related to the development of the fetal retina and their potential roles in certain pathways, which may provide novel insights into retinal development and contribute to a better understanding of other retinal diseases.
Collapse
Affiliation(s)
- Jiyu Zhang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Key Clinical Specialty, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Tong Du
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Key Clinical Specialty, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Yiqing Jin
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Key Clinical Specialty, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| | - Yusheng Bao
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Qinglan Ma
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- National Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Clinical Research Center for Eye Diseases, Shanghai, China
- Shanghai Key Clinical Specialty, Shanghai, China
- Shanghai Key Laboratory of Ocular Fundus Diseases, Shanghai, China
- Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
- Shanghai Engineering Center for Precise Diagnosis and Treatment of Eye Diseases, Shanghai, China
| |
Collapse
|
48
|
Li L, Guo D, Shi C, Zheng Y. The predictive role of sedentary behavior and physical activity on adolescent depressive symptoms: A machine learning approach. J Affect Disord 2025; 378:81-89. [PMID: 40015649 DOI: 10.1016/j.jad.2025.02.085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 02/07/2025] [Accepted: 02/24/2025] [Indexed: 03/01/2025]
Abstract
OBJECTIVE This study aims to investigate the predictive value of sedentary behavior and physical activity in adolescent depressive symptoms. METHODS A total of 2419 adolescent students (grades 7-12) from six administrative regions in China were surveyed. Measures included the Physical Activity Rating Scale for Children (PARS-3), a self-designed questionnaire assessing sedentary behavior among Chinese children and adolescents, and the Children's Depression Inventory (CDI). Machine learning models were trained and tested to predict depressive symptoms based on different types of sedentary behavior, physical activity, and other key variables. RESULTS The trained random forest model demonstrated high predictive accuracy (ACC = 90.52 %), with a precision of 92.01 %, recall of 87.95 %, and an F1 score of 0.90. Key predictors of depressive symptoms included sedentary behaviors such as multimedia learning, watching TV, classroom learning, and playing video games. Physical activity also emerged as a significant factor in predicting adolescent depressive symptoms. CONCLUSIONS The machine learning-based predictive model exhibited strong performance, suggesting that sedentary behavior and physical activity data can effectively predict depression symptoms in Chinese adolescents.
Collapse
Affiliation(s)
- Lin Li
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China.
| | - Dongxi Guo
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China
| | - Chengchao Shi
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China
| | - Yifan Zheng
- Key Laboratory of Adolescent Health Assessment and Exercise Intervention of Ministry of Education, East China Normal University, Shanghai 200241, China; College of Physical Education and Health, East China Normal University, Shanghai 200241, China
| |
Collapse
|
49
|
Zhou Y, Pei C, Yin H, Zhu R, Yan N, Wang L, Zhang X, Lan T, Li J, Zeng L, Huo L. Predictors of smartphone addiction in adolescents with depression: combing the machine learning and moderated mediation model approach. Behav Res Ther 2025; 189:104749. [PMID: 40262465 DOI: 10.1016/j.brat.2025.104749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2025] [Revised: 04/10/2025] [Accepted: 04/15/2025] [Indexed: 04/24/2025]
Abstract
Smartphone addiction (SA) significantly impacts the physical and mental health of adolescents, and can further exacerbate existing mental health issues in those with depression. However, fewer studies have focused on the predictors of SA in adolescents with depression. This study employs machine learning methods to identify key risk factors for SA, using the interpretable SHapley Additive exPlanations (SHAP) method to enhance interpretability. Additionally, by constructing a mediation moderation model, the interactions between significant risk factors are analyzed. The study included 2203 adolescents with depression. Machine learning results from four models (Random Forest, Support Vector Machine, Logistic Regression, XGBoost) consistently identified emotion-focused coping, rumination, and school bullying as the strongest predictors of SA. Further mediation moderation analyses based on the Interaction of Person-Affect-Cognition-Execution (I-PACE) model revealed that rumination significantly mediated the relationship between school bullying and SA, and emotion-focused coping significantly moderated the relationships between school bullying and both rumination and SA. This is the first study to use machine learning to explore the predictors of SA in depressive adolescents and further analyze the interactions among these predictors. Future interventions for SA in adolescents with depression may benefit from psychotherapy that addresses emotion-focused coping and rumination.
Collapse
Affiliation(s)
- Yongjie Zhou
- Shenzhen Mental Health Center, Shenzhen Kangning Hospital, Shenzhen, China
| | - Chenran Pei
- Key Laboratory of Brain, Cognition and Education Science, Ministry of Education, Institute for Brain Research and Rehabilitation, Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Hailong Yin
- Key Laboratory of Brain, Cognition and Education Science, Ministry of Education, Institute for Brain Research and Rehabilitation, Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Rongting Zhu
- The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China
| | - Nan Yan
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China
| | - Lan Wang
- CAS Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China
| | - Xuankun Zhang
- Shenzhen Mental Health Center, Shenzhen Kangning Hospital, Shenzhen, China; School of Medicine, Southern University of Science and Technology, Shenzhen, China
| | - Tian Lan
- Shenzhen Mental Health Center, Shenzhen Kangning Hospital, Shenzhen, China; Medicine School, Shenzhen University, Shenzhen, China
| | - Junchang Li
- Shenzhen Mental Health Center, Shenzhen Kangning Hospital, Shenzhen, China
| | - Lingyun Zeng
- Shenzhen Mental Health Center, Shenzhen Kangning Hospital, Shenzhen, China
| | - Lijuan Huo
- Key Laboratory of Brain, Cognition and Education Science, Ministry of Education, Institute for Brain Research and Rehabilitation, Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China; The Affiliated Brain Hospital of Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
50
|
Asadollah SBHS, Safaeinia A, Jarahizadeh S, Alcalá FJ, Sharafati A, Jodar-Abellan A. Dissolved organic carbon estimation in lakes: Improving machine learning with data augmentation on fusion of multi-sensor remote sensing observations. WATER RESEARCH 2025; 277:123350. [PMID: 39999600 DOI: 10.1016/j.watres.2025.123350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Revised: 02/18/2025] [Accepted: 02/21/2025] [Indexed: 02/27/2025]
Abstract
This paper presents a novel approach for estimating Dissolved Organic Carbon (DOC) concentrations in lakes considering both carbon sources and sink operators. Despite the critical role of DOC, the combined application of machine learning, as a robust predictor, and remote sensing technology, which reduces costly and time-intensive in-situ sampling, has been underexplored in DOC research. Focusing on lakes over the states of New York, Vermont and Maine (United States, U.S.), this study integrates in-situ DOC measurements with surface reflectance bands obtained from Landsat satellites between 2000 and 2020. Using these bands as inputs of the Random Forest (RF) predictive model, the introduced methodology aims to explore the ability of remote sensing data for large-scale DOC simulation. Initial results indicate low accuracy metrics and significant under-estimation due to the imbalance distribution of DOC samples. Statistical analysis showed that the mean DOC concentration was 5.37±3.37 mg/L (mean±one standard deviation), with peak up to 25 mg/L. A highly skewed distribution of chemical components towards the lower ranges can lead to model misrepresentation of extreme and hazardous events, as they are clouded by unimportant events due to significantly lower occurrence rates. To address this issue, the Synthetic Minority Over-sampling Technique (SMOTE) was applied as a key innovation, generating synthetic samples that enhance RF accuracy and reduce the associated errors. Fusion of in-situ and remote sensing data, combined with machine learning and data augmentation, significantly enhances DOC estimation accuracy, especially in high concentration ranges which are critical for environmental health. With prediction metrics of RMSE = 1.75, MAE = 1.09, and R2 = 0.74, RF-SMOTE significantly improve the metrics obtained from stand-alone RF, particularly in estimating high DOC concentrations. Considering the product spatial resolution of 30 m, the model's output provides potential revenue for global application in lake monitoring, even in remote regions where direct sampling is limited. This novel fusion of remote sensing, machine learning and data augmentation offers valuable insights for water quality management and understanding carbon cycling in aquatic ecosystems.
Collapse
Affiliation(s)
- Seyed Babak Haji Seyed Asadollah
- Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA; Department of Civil Engineering, University of Alicante, 03690 Alicante, Spain.
| | - Ahmadreza Safaeinia
- Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA.
| | - Sina Jarahizadeh
- Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA.
| | - Francisco Javier Alcalá
- Departamento de Desertificación y Geo-Ecología, Estación Experimental de Zonas Áridas (EEZA-CSIC), 04120 Almería, Spain; Instituto de Ciencias Químicas Aplicadas, Facultad de Ingeniería, Universidad Autónoma de Chile, Santiago 7500138, Chile.
| | - Ahmad Sharafati
- Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran; New Era and Development in Civil Engineering Research Group, Scientific Research Center, Al-Ayen University, Thi-Qar, Nasiriyah, 64001, Iraq
| | - Antonio Jodar-Abellan
- Soil and Water Conservation Research Group, Centre for Applied Soil Science and Biology of the Segura, Spanish National Research Council (CEBAS-CSIC), Campus de Espinardo 30100, P.O. Box 164, Murcia, Spain.
| |
Collapse
|