Gradient Boosting Teroptimasi untuk Klasifikasi Diabetes dengan Analisis Eksplanabilitas Menggunakan SHAP dan LIME


Angga Kurniawan(1*); Nisrina Hanifa Setiono(2); Afin Muhammad Nurtsani(3); Andi Hisyam Helmi Faalih Fakhruddin(4); Muhammad Naufal Farabbia(5); Nadia Syahda Fitriani(6);

(1) Sains Data, Universitas Telkom Kampus Purwokerto, Banyumas, Indonesia
(2) Sains Data, Universitas Telkom Kampus Purwokerto, Banyumas, Indonesia
(3) Teknik Biomedis, Universitas Telkom Kampus Purwokerto, Banyumas, Indonesia
(4) Sains Data, Universitas Telkom Kampus Purwokerto, Banyumas, Indonesia
(5) Sains Data, Universitas Telkom Kampus Purwokerto, Banyumas, Indonesia
(6) Teknik Biomedis, Universitas Telkom Kampus Purwokerto, Banyumas, Indonesia
(*) Corresponding Author

  

Abstract


Penelitian ini mengusulkan kerangka klasifikasi diabetes yang mengintegrasikan tiga algoritma gradient boosting, yaitu XGBoost, LightGBM, dan CatBoost, yang dioptimasi secara otomatis menggunakan Optuna berbasis optimasi Bayesian dengan 50 percobaan dan cross validation 5-folds. Dataset yang digunakan adalah Pima Indians Diabetes Dataset dengan 768 sampel dan 8 fitur klinis. Preprocessing dilakukan menggunakan beberapa metode meliputi penggantian nilai nol sebagai nilai hilang, imputasi median dari data pelatihan, serta penambahan fitur indikator Insulin_missing. Ketidakseimbangan kelas ditangani melalui parameter scale_pos_weight yang dioptimasi bersama hiperparameter model dalam satu ruang pencarian. Evaluasi model menggunakan metrik AUC-ROC, akurasi, F1-score, recall, dan presisi dengan ambang batas klasifikasi 0,6. Analisis eksplanabilitas dilakukan menggunakan SHAP pada level global dan lokal serta LIME pada level instans untuk meningkatkan transparansi model. Hasil optimasi menunjukkan CatBoost mencapai AUC-ROC cross validation tertinggi sebesar 0.8471, diikuti LightGBM sebesar 0.8409 dan XGBoost sebesar 0.8406.  Pada data uji, XGBoost mencapai AUC-ROC 0,8246, akurasi 0,753, F1-Score 0,683, dan recall kelas diabetes 0,759, sedangkan CatBoost mencapai recall tertinggi 0,888 dengan F1-Score 0,690. Analisis SHAP secara konsisten mengidentifikasi Glucose, BMI, Age, dan DiabetesPedigreeFunction sebagai empat prediktor paling berpengaruh di ketiga model, selaras dengan pengetahuan klinis mengenai faktor risiko utama diabetes tipe 2.

Keywords


Gradient Boosting; Optuna; SHAP; LIME; Prediksi Diabetes

  
  

Full Text:

PDF
  

Article Metrics

Abstract view: 27 times
PDF view: 9 times
     

Digital Object Identifier

doi  https://doi.org/10.33096/busiti.v7i2.3413
  

Cite

References


International Diabetes Federation, IDF Diabetes Atlas, 10th ed. Brussels, Belgium: International Diabetes Federation, 2021. [Online]. Available: https://www.diabetesatlas.org

N. H. Cho et al., “IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045,” Diabetes Res. Clin. Pract., vol. 138, pp. 271–281, Apr. 2018, doi: 10.1016/j.diabres.2018.02.023.

E. Afsaneh, A. Sharifdini, H. Ghazzaghi, and M. Z. Ghobadi, “Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review,” Diabetol. Metab. Syndr., vol. 14, no. 1, Dec. 2022, doi: 10.1186/s13098-022-00969-9.

T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Mar. 2016, doi: 10.1145/2939672.2939785.

G. Ke et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” Adv. Neural Inf. Process. Syst., vol. 30, 2017, Accessed: Mar. 27, 2026. [Online]. Available: https://github.com/Microsoft/LightGBM.

A. Holzinger, G. Langs, H. Denk, K. Zatloukal, and H. Müller, “Causability and explainability of artificial intelligence in medicine,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 9, no. 4, p. e1312, Jul. 2019, doi: 10.1002/widm.1312.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

BergstraJames and BengioYoshua, “Random search for hyper-parameter optimization,” The Journal of Machine Learning Research, Feb. 2012, doi: 10.5555/2188385.2188395.

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A Next-generation Hyperparameter Optimization Framework,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2623–2631, Jul. 2019, Accessed: Mar. 27, 2026. [Online]. Available: http://arxiv.org/abs/1907.10902

S. M. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” Neural Information Processing Systems, 2017.

M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” NAACL-HLT 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session, pp. 97–101, Feb. 2016, doi: 10.18653/v1/n16-3020.

A. H. Osman and H. M. Aljahdali, “Diabetes Disease Diagnosis Method based on Feature Extraction using K-SVM,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 1, Jan. 2017, doi: 10.14569/ijacsa.2017.080130.

S. Daram, “Explainable AI in Healthcare: Enhancing Trust, Transparency, and Ethical Compliance in Medical AI Systems,” International Journal of AI, BigData, Computational and Management Studies, vol. 6, no. 2, pp. 11–20, Apr. 2025, doi: 10.63282/3050-9416.ijaibdcms-v6i2p102.

O. Loyola-Gonzalez, “Black-box vs. White-Box: Understanding their advantages and weaknesses from a practical point of view,” 2019, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2019.2949286.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv. Neural Inf. Process. Syst., vol. 31, 2018, Accessed: Mar. 27, 2026. [Online]. Available: https://github.com/catboost/catboost

A. KURNIAWAN, M. KUDIN, and A. S. AT-TAQWA, “Evaluation of Machine Learning and Deep Learning Algorithms with Feature Scaling and K-Fold Cross-Validation for Diabetes Classification,” Rabit : Jurnal Teknologi dan Sistem Informasi Univrab, vol. 11, no. 1, pp. 494–512, Jan. 2026, doi: 10.36341/rabit.v11i1.7005.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 Buletin Sistem Informasi dan Teknologi Islam (BUSITI)

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.