Decision Tree C4.5 Performance Improvement using Synthetic Minority Oversampling Technique (SMOTE) and K-Nearest Neighbor for Debtor Eligibility Evaluation


Edi Priyanto(1); Enny Itje Sela(2); Luther Alexander Latumakulita(3*); Noourul Islam(4);

(1) Universitas Teknologi Yogyakarta
(2) Universitas Teknologi Yogyakarta
(3) Universitas Sam Ratulangi
(4) Kanpur Institute of Technology
(*) Corresponding Author

  

Abstract


Nowadays, information technology especially machine learning has been used to evaluate the feasibility of debtors. One of the challenges in this classification model is the occurrence of imbalanced datasets, especially in the German Credit Dataset. Another challenge is developing an optimal model for evaluating debtor eligibility. Based on these challenges, this study aims to develop an optimal model for evaluating debtor eligibility on the German Credit Dataset, using the decision trees, k-Nearest Neighbor (k-NN) and Synthetic Minority Oversampling Technique (SMOTE). SMOTE and k-NN is used to overcome challenges regarding imbalanced datasets. While the decision tree are applied to produce a debtor classification model. In general, the steps taken are preparing datasets, pre-processing data, dividing datasets, oversampling with SMOTE, and classification models using decision trees, and testing. Model performance evaluation is represented by accuracy values obtained from the confusion matrix and area under curve (AUC) values generated by the Receiver Operating Characteristic (ROC). Based on the tests that have been carried out, the best accuracy value in the test is obtained at 73.00% and the AUC value is 0.708, in parameters k = 3 and Max-Depth = 25. Based on the analysis produced, the proposed model can improve performance compared to if the dataset is not applied SMOTE.

Keywords


Debtor Eligibility; Decision Tree; Imbalanced Dataset; KNN; Machine Learning; SMOTE

  
  

Full Text:

PDF
  

Article Metrics

Abstract view: 129 times
PDF view: 68 times
     

Digital Object Identifier

doi  https://doi.org/10.33096/ilkom.v15i2.1676.373-381
  

Cite

References


M. I. Sari, S. Siregar, and I. Harahap, “Manajemen Risiko Kredit bagi Bank Umum,” in Seminar Nasional Teknologi Komputer & Sains (SAINTEKS), 2020, pp. 553–557. [Online]. Available: https://prosiding.seminar-id.com/index.php/sainteks

M. A. Muslim, A. Nurzahputra, and B. Prasetiyo, “Improving Accuracy of C4.5 Algorithm using Split Feature Reduction Model and Bagging Ensemble for Credit Card Risk Prediction,” in 2018 International Conference on Information and Communications Technology (ICOIACT), Mar. 2018, pp. 141–145.

S. Marentika Br Tarigan, H. Jaya, and I. Santoso, “Sistem Pendukung Keputusan Untuk Menentukan Kelayakan Calon Kreditur Pada PT.ITC Finance SM Raja Medan Dengan Menggunakan Metode Fuzzy Tsukamoto,” Jurnal CyberTech, vol. 4, no. 1, pp. 1–10, 2021, [Online]. Available: https://ojs.trigunadharma.ac.id/

I. Rahmianti, “Analisis Kelayakan Pemberian Kredit Koperasi Dengan Metode Data Mining Decision Tree,” Jurnal Informatika & Rekayasa Elektronika), vol. 5, no. 2, 2022, [Online]. Available: http://e-journal.stmiklombok.ac.id/index.php/jireISSN.2620-6900

S. Wahyuningsih and D. Retno Utari, “Perbandingan Metode K-Nearest Neighbor, Naïve Bayes dan Decision Tree untuk Prediksi Kelayakan Pemberian Kredit,” in Konferensi Nasional Sistem Informasi 2018, 2018, pp. 8–9.

X. Dastile, T. Celik, and M. Potsane, “Statistical and machine learning models in credit scoring: A systematic literature survey,” Applied Soft Computing Journal, vol. 91, p. 106263, 2020, doi: 10.1016/j.asoc.2020.106263.

Siwi Haryu Pramesti, I. Indahwati, and U. D. Syafitri, “Analisis Regresi Logistik dan Cart untuk Credit Scoring dengan Penanganan Kelas Tak Seimbang,” Xplore: Journal of Statistics, vol. 11, no. 3, pp. 226–237, Sep. 2022, doi: 10.29244/xplore.v11i3.1015.

E. Sutoyo, M. Asri Fadlurrahman, J. Telekomunikasi Jl Terusan Buah Batu, K. Dayeuhkolot, K. Bandung, and J. Barat, “Penerapan SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Television Advertisement Performance Rating Menggunakan Artificial Neural Network,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 6, no. 3, pp. 379–385, Dec. 2020.

F. Dwi Astuti and F. Nova Lenti, “Implementasi SMOTE untuk mengatasi Imbalance Class pada Klasifikasi Car Evolution menggunakan K-NN,” Jurnal JUPITER, vol. 13, no. 1, pp. 89–98, 2021.

R. Siringoringo, “Klasifikasi Data Tidak Seimbang menggunakan Algoritma SMOTE dan k-Nearest Neighbor,” 2018.

M. Z. Abedin, C. Guotai, P. Hajek, and T. Zhang, “Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk,” Complex and Intelligent Systems, 2022, doi: 10.1007/s40747-021-00614-4.

Y. E. Ardiningtyas and P. H. P. Rosa, “Analisis Balancing Data Untuk Mningkatkan Akurasi Dalam Klasifikasi,” in Prosiding Seminar Nasional Aplikasi Sains & Teknologi (SNAST), Mar. 2021, pp. 24–28.

H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, doi: 10.14710/jtsiskom.8.2.2020.89-93.

W. Liu, H. Fan, and M. Xia, “Step-wise multi-grained augmented gradient boosting decision trees for credit scoring,” Eng Appl Artif Intell, vol. 97, Jan. 2021, doi: 10.1016/j.engappai.2020.104036.

X. Zhou et al., “A state of the art survey of data mining-based fraud detection and credit scoring,” in MATEC Web of Conferences, Aug. 2018, vol. 189, pp. 1–15. doi: 10.1051/matecconf/201818903002.

Y. E. Kurniawati, “Class Imbalanced Learning Menggunakan Algoritma Synthetic Minority Over-sampling Technique-Nominal (SMOTE-N) pada Dataset Tuberculosis Anak,” 2019.

A. Yani and E. Hegarini, “Analisa Kelayakan Kredit Menggunakan Artifcial Neural Network dan Backpropogation (Studi Kasus German Credit Data),” Jurnal Ilmiah KOMPUTASI, vol. 18, no. 4, pp. 385–390, Dec. 2019.

D. P. Putra, D. Bheta, A. Wardijono, J. Bri, R. Dalam, and J. Selatan, “Analisis Akurasi Penerapan Algoritma Support Vector Machine Menggunakan Kernel Radial Basis Function pada Penentuan Kelayakan Kredit (Studi Kasus German Kredit Data),” Jurnal Ilmiliah KOMPUTASI, vol. 19, no. 2, pp. 175–180, 2020, doi: 10.32409/jikstik.19.2.2786.

R. Setiawan, “Analisis Kelayakan Pemberian Kredit Nasabah Koperasi Menggunakan Algoritma C4.5,” Techno Xplore Jurnal Ilmu Komputer dan Teknologi Informasi, vol. 5, no. 2, pp. 75–78, 2020.

I. B. K. Manuaba, I. Sutedja, and R. Bahana, “The evaluation of supervised classifier models to develop a machine learning API for predicting cardiovascular disease risk,” ICIC Express Letters, vol. 14, no. 3, pp. 219–226, 2020, doi: 10.24507/icicel.14.03.219.

W. Suci and S. Samsudin, “Algoritma K-Nearest Neighbors dan Synthetic Minority Oversampling Technique dalam Prediksi Pemesanan Tiket Pesawat,” Jurnal Media Informatika Budidarma, vol. 6, no. 3, p. 1775, Jul. 2022, doi: 10.30865/mib.v6i3.4374.

E. I. Sela and R. Pulungan, “Osteoporosis identification based on the validated trabecular area on digital dental radiographic images,” Procedia Comput Sci, vol. 157, pp. 282–289, 2019, doi: 10.1016/j.procs.2019.08.168.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Edi Priyanto, Enny Itje Sela, Luther Alexander Latumakulita, Noourul Islam

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.