Analisis Perbandingan Algoritma Random Forest, SVM, dan Logistic Regression untuk Menentukan Model Terbaik Prediksi Penyakit Diabetes
DOI:
https://doi.org/10.55606/jutiti.v5i3.6213Keywords:
data mining, machine learning, Random Forest, support vector machine, Logistic RegressionAbstract
Diabetes is a chronic metabolic disorder characterized by elevated blood glucose levels caused by the body’s inability to produce or effectively respond to insulin. The increasing prevalence of diabetes in Indonesia requires accurate data-driven early detection systems to assist the diagnostic process. This study aims to compare the performance of three machine learning algorithms—Support Vector Machine (SVM), Random Forest, and Logistic Regression—in predicting diabetes disease based on patient clinical data. The dataset used was obtained from the Kaggle repository titled 100,000 Diabetes Clinical Dataset. The research process was conducted using the Orange Data Mining software through several stages, including data preprocessing, One-Hot Encoding transformation, model training, and evaluation using the 10-Fold Cross Validation method. The results show that the Random Forest algorithm achieved the best performance with an accuracy of 97.1%, followed by Logistic Regression at 96.0% and SVM at 92.3%. These findings indicate that ensemble-based methods such as Random Forest outperform others in producing stable and accurate predictions for diabetes diagnosis
Downloads
References
Akhsani, R., Prayoga, S., Basatha, R., Akbar, M. S., Aisyah Elfaiz, E., Putra, C. D., Surabaya, N., Kec, J. K., & Surabaya, G. (n.d.). Penerapan metode Naïve Bayes untuk klasifikasi performa siswa. Sistemasi: Jurnal Sistem Informasi. http://sistemasi.ftik.unisi.ac.id
Choksi, P. (2023). Comprehensive diabetes clinical dataset (100k rows). Kaggle.
Citra Mawani, A., Li Hin, L., & Anubhakti, D. (2023). Deteksi dini gejala awal penyakit diabetes menggunakan algoritma Random Forest. Idealis: Indonesia Journal Information System, 6(2). http://jom.fti.budiluhur.ac.id/index.php/IDEALIS/index
Dita Ayuningtiyas Tuti, Fitriyani, N. L., & Maulana, J. (2023). Literature study: Risk factors for the incidence of diabetes mellitus in productive age in Indonesia. Journal of Multidisciplinary Science, 2(6), 288–296. https://doi.org/10.58330/prevenire.v2i6.413
Exploring the non-communicable disease burden in Indonesia – Findings from the 2023 health survey. (2025). Indonesia Journal of Public Health Nutrition, 5(2). https://doi.org/10.7454/ijphn.v5i2.1064
Fadli Kurniawan, M., & Ayu Megawaty, D. (2025). Comparison of logistic regression, random forest, support vector machine (SVM) and K-nearest neighbor (KNN) algorithms in diabetes prediction. Journal of Applied Informatics and Computing, 9(5). http://jurnal.polibatam.ac.id/index.php/JAIC
Fadlianda, D., Prananto, A., Eriska, C. A., Anjanira, S., Syadzwina, N., & Ula, M. (n.d.). Diagnosis penyakit jantung menggunakan algoritma Support Vector Machine (SVM). SENASTIKA Universitas Malikussaleh. https://www.kaggle.com/code/rafiromolo/prediksi-
Hakim, L., Sobri, A., Sunardi, L., & Nurdiansyah, D. (2025). Prediksi penyakit jantung berbasis machine learning dengan menggunakan metode K-NN. Jurnal Digital Teknologi Informasi, 7(2), 14. https://doi.org/10.32502/digital.v7i2.9429
International Diabetes Federation. (2024, Oktober). Indonesia – Western Pacific members. International Diabetes Federation.
Khairunnisa, A. (n.d.). Analisis perbandingan model regresi logistik dan probit dengan K-fold cross validation dalam mengidentifikasi faktor signifikan pada penyakit diabetes melitus. https://doi.org/10.26555/konvergensi.30879
Lu, W., Zhang, Y., Wen, W., Yan, H., & Li, C. (Eds.). (2022). Cyber security (Vol. 1506). Springer Nature Singapore. https://doi.org/10.1007/978-981-16-9229-1
Rahaman, M. J. (2024). A comprehensive review to understand the definitions, advantages, disadvantages and applications of machine learning algorithms. International Journal of Computer Applications, 186(31), 43–47. https://doi.org/10.5120/ijca2024923868
Sanhaji, G., Febrianti, A., & Teknik, F. (n.d.). Aplikasi DIATECT untuk prediksi penyakit diabetes menggunakan SVM berbasis web (Vol. 18, No. 1).
Siswoyo, B., & Iqbal Nurhafidz, M. (n.d.). Penerapan algoritma Random Forest untuk prediksi risiko diabetes berdasarkan data kesehatan pasien. JTID Integrasi Publikasi Digital, 1(1).
Syahputra, H., & Wibowo, A. (2023). Comparison of Support Vector Machine (SVM) and Random Forest algorithm for detection of negative content on websites. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, 9(1), 165–173. https://doi.org/10.26555/jiteki.v9i1.25861
Syamsudin, T., Handhayani, T., Muhammad, _____, & Syaifudin, I. (n.d.). Perbandingan klasifikasi penyakit diabetes menggunakan metode machine learning. Jurnal Ilmu Komputer dan Sistem Informasi. https://www.kaggle.com/datasets/nanditapore/healthcar
Teknika, J., & Supriyatna, A. R. (n.d.). Prediksi penyakit diabetes menggunakan algoritma Random Forest. Teknika, 17(1), 163–172.
Yanti, D. E., Framesti, L., & Desiani, A. (n.d.). Perbandingan algoritma C4.5 dan SVM dalam klasifikasi penyakit anemia. JIP (Jurnal Informatika Polinema). https://www.kaggle.com/datasets/biswaranjanrao/an
Yusoff, M. I. M. (2024). Machine learning: An overview. Open Journal of Modelling and Simulation, 12(3), 89–99. https://doi.org/10.4236/ojmsi.2024.123006
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Teknik Informatika dan Teknologi Informasi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




