An ensemble learning approach for diabetes prediction using the stacking method

Elliot Kojo Attipoe, Alimatu Saadia Yussiff, Maame Gyamfua Asante-Mensah, Emmanuel Dortey Tetteh, Regina Esi Turkson

Abstract


Diabetes is a severe illness characterized by high blood glucose levels. Machine learning algorithms, with their ability to detect and predict diabetes in its early stages, offer a promising avenue for research. This study sought to enhance the accuracy of predicting diabetes mellitus by employing the stacking method. The stacking method was chosen because it integrates predictions from various base models, resulting in a more precise final prediction. The stacking method enhances accuracy and generalization by utilizing the varied strengths of multiple base models. The Pima Indians diabetes dataset, a widely used benchmark dataset, was utilized in the study. The machine learning models used for the studies were logistic regression (LR), naïve Bayes (NB), extreme gradient boost (XGBoost), K-nearest neighbor (KNN), decision tree (DT), and support vector machine (SVM). LR, KNN, and SVM were the best-performing models based on accuracy, F1-score, precision, and area under the curve (AUC) score, and were consequently used as the base model for the stacking method. The LR model was utilized for the meta-model. The proposed ensemble approach using the stacking method demonstrated a high accuracy of 82.4%, better than the individual models and other ensemble techniques such as bagging or boosting. This study advances diabetes prediction by developing a more accurate early-stage detection model, thereby improving clinical management of the disease.

Keywords


Diabetes prediction; Ensemble methods; K-nearest neighbor; Logistic regression; Support vector machine

Full Text:

PDF


DOI: https://doi.org/10.11591/csit.v6i2.p102-111

Refbacks

  • There are currently no refbacks.


Computer Science and Information Technologies
p-ISSN: 2722-323X, e-ISSN: 2722-3221
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Universitas Ahmad Dahlan (UAD).

CSIT Visitor Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.