Authors:
Abstract: Loan default risk modeling is a critical area for financial institutions seeking to optimize credit decisions while ensuring compliance and transparency. This study offers a detailed comparative analysis between a newly developed interpretable machine learning framework and existing loan prediction models, particularly the benchmark study by Haque and Hassan (2022). While the base model emphasized loan approval prediction with high accuracy, the present study focuses on predicting actual defaults a more financially and regulatorily significant outcome. The proposed model integrates class imbalance mitigation (via SMOTE), ensemble learning (Random Forest and XGBoost), and explainable AI techniques (SHAP) to address limitations in prior works. A rigorous evaluation on a real-world dataset of 255,347 loan records showed that the Random Forest model achieved superior performance with 96.26% accuracy, an F1-score of 0.8014, and AUC-ROC of 0.9215, while providing global and local interpretability. Compared to the base study's black-box AdaBoost model (99.99% accuracy on approval tasks), this research contributes a more balanced, interpretable, and risk-sensitive predictive framework. The paper concludes that transparent and fair AI-based credit scoring systems can offer practical utility for real-world financial operations.
DOI: https://doi.org/10.5281/zenodo.16530155