A Comparative Evaluation Of Machine Learning Algorithms For Early Detection Of Type 2 Diabetes

21 Apr

Authors: Seema Ahirwar, Rupali Chaure

Abstract: This paper presents a comparative evaluation of seven supervised ML classifiers Logistic Regression, Naive Bayes, k-Nearest Neighbors, Decision Tree, Random Forest, XGBoost, and SVM for early Type 2 diabetes (T2DM) prediction. Using the Pima Indians Diabetes Database (PIDD, n=768) and Frankfurt Hospital Diabetes Dataset (FHDD, n=2000), we apply standardized preprocessing with SMOTE-based class imbalance correction and stratified 10-fold cross-validation[1]. Models are evaluated on Accuracy, Precision, Recall, F1-Score, ROC-AUC, and MCC. Results show XGBoost consistently outperforms all classifiers (AUC: 0.901 PIDD, 0.951 FHDD), while Logistic Regression retains interpretability advantages for clinical deployment. Feature importance analysis identifies fasting plasma glucose, BMI, and HbA1c as top predictors, aligning with clinical guidelines.

DOI: https://doi.org/10.5281/zenodo.19678064