A comparative study of classical, bagging, and hybrid methods for optimizing loan default prediction

Ismail Idowu Akuji, Ahmed Babajide Olanrewaju, Taofik Abiodun Ahmed, Ayodeji Jubril Alabi, Idris Babatunde Adeyemi

Abstract


This study optimized loan default prediction by comparing k-nearest neighbor (KNN), random forest (RF), and hybrid methods. The dataset used was preprocessed using simple imputer, label encoder, synthetic minority oversampling technique (SMOTE), and correlation-based feature selection on top 7 features while grid search cross-validation (GSCV) and random search cross-validation (RSCV) were employed to optimize models. Before tuning, RF achieved perfect performance (100% accuracy, 99.8% precision, 100% recall, 99.9% F1, 1.000 area under curve (AUC)), outperforming untuned KNN (99.2% accuracy, 96.2% precision, 99.8% recall, 98.0% F1, 0.997 AUC) and hybrid (99.8% accuracy, 99.1% precision, 99.9% recall, 99.5% F1). After tuning, RF maintained same results, confirmed by 10× nested CV stability (F1=0.9997±0.0002) and McNemar tests showing equivalence to RF_RSCV (p=1.0000). KNN improved marginally in precision (96.2%→99.8%) but declined in recall, while hybrid dropped slightly across metrics. Partial dependence plots confirm RF’s dominance stems from three key features (lump_sum_payment, property_value, co-applicant_credit_type), validated by business impact analysis showing minimal errors against KNN/hybrid. RF_GSCV’s perfection reflects true generalization, not overfitting, establishing it as the production-ready gold standard. Future work can address static dataset limitation by incorporating dynamic time-series data with online learning, concept drift detection, and real-time macroeconomic features to enhance real-world generalizability.

Keywords


Correlation based feature selection; Grid search cross-validation; K-nearest neighbor; Loan default; Random forest; Random search cross-validation

Full Text:

PDF


DOI: https://doi.org/10.11591/csit.v7i2.p179-195

Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Ismail Idowu Akuji, Ahmed Babajide Olanrewaju, Taofik Abiodun Ahmed, Ayodeji Jubril Alabi, Idris Babatunde Adeyemi

Computer Science and Information Technologies
p-ISSN: 2722-323X, e-ISSN: 2722-3221
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Universitas Ahmad Dahlan (UAD).

CSIT Visitor Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.