Addressing Class Imbalance in Machine Learning for Predicting On-Time Student Graduation at The Islamic University of Riau

Authors

  • Akmar Efendi Department of Informatics Engineering, Faculty of Engineering, Universitas Islam Riau
  • Sarjon Defit Department of Information Technology, Faculty of Computer Science, Universitas Putra Indonesia YPTK

DOI:

https://doi.org/10.15408/jti.v18i2.45913

Keywords:

Student Graduation, SMOTE, Classification, Machine Learning

Abstract

Timely graduation is an important indicator of academic performance in higher education. However, many students still fail to graduate on time, prompting the need for predictive models to support academic decision-making. This study aims to analyze the impact of class imbalance on machine learning algorithm performance in predicting student graduation at the Islamic University of Riau. Data were obtained through questionnaires and labeled into “graduated on time” and “not on time” classes, which were initially imbalanced. The Synthetic Minority Over-Sampling Technique (SMOTE) was applied during preprocessing to balance the dataset. Four machine learning algorithms were compared: Decision Tree, Gaussian Naive Bayes, K-Nearest Neighbors, and Support Vector Machine. The evaluation was conducted with and without SMOTE, using accuracy, precision, recall, F1-score, and confusion matrix. Results showed significant performance improvements after applying SMOTE, with all models achieving around 99% accuracy. SVM achieved the most stable results across both conditions. The study highlights the effectiveness of SMOTE in improving classification fairness and reliability, especially in datasets with class imbalance. This work may assist universities in early intervention for students at risk of late graduation.

References

[1] Herianto, B. Kurniawan, Z. H. Hartomi, Y. Irawan, and M. K. Anam, “Machine Learning Algorithm Optimization using Stacking Technique for Graduation Prediction,” Journal of Applied Data Sciences, vol. 5, no. 3, pp. 1272–1285, Sep. 2024, doi: 10.47738/jads.v5i3.316.

[2] R. Bakri, N. P. Astuti, and A. S. Ahmar, “Machine Learning Algorithms with Parameter Tuning to Predict Students’ Graduation-on-time: A Case Study in Higher Education,” Journal of Applied Science, Engineering, Technology, and Education, vol. 4, no. 2, pp. 259–265, Dec. 2022, doi: 10.35877/454ri.asci1581.

[3] R. Sepriansyah and S. D. Purnamasari, “Prediction of Student Graduation Using Naïve Bayes,” Budapest International Research and Critics Institute-Journal (BIRCI-Journal), vol. 5, no. 3, pp. 24255–24268, 2022, doi: 10.33258/birci.v5i3.6447.

[4] A. Desfiandi and B. Soewito, “Student Graduation Time Prediction Using Logistic Regression, Decision Tree, Support Vector, And Adaboost Ensemble Learning,” International Journal of Information System and Computer Science) IJISCS, vol. 7, no. 3, pp. 195–199, 2023, doi: 10.56327/ijiscs.v7i2.1579.

[5] M. K. Anam, M. B. Firdaus, F. Suandi, Lathifah, T. Nasution, and S. Fadly, “Performance Improvement of Machine Learning Algorithm Using Ensemble Method on Text Mining,” in ICFTSS 2024 - International Conference on Future Technologies for Smart Society, Kuala Lumpur: Institute of Electrical and Electronics Engineers Inc., Sep. 2024, pp. 90–95. doi: 10.1109/ICFTSS61109.2024.10691363.

[6] A. N. Ulfah and M. K. Anam, “Analisis Sentimen Hate Speech Pada Portal Berita Online Menggunakan Support Vector Machine (SVM),” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 7, no. 1, pp. 1–10, 2020, doi: 10.35957/jatisi.v7i1.196.

[7] L. L. Van FC, M. K. Anam, S. Bukhori, A. K. Mahamad, S. Saon, and R. L. V. Nyoto, “The Development of Stacking Techniques in Machine Learning for Breast Cancer Detection,” Journal of Applied Data Sciences, vol. 6, no. 1, pp. 71–85, Jan. 2025, doi: 10.47738/jads.v6i1.416.

[8] F. Suandi et al., “Enhancing Sentiment Analysis Performance Using SMOTE and Majority Voting in Machine Learning Algorithms,” in International Conference on Applied Engineering, Atlantis Press, 2024, pp. 126–138. doi: 10.2991/978-94-6463-620-8_10.

[9] F. P. Arifianti and A. Salam, “XGBoost and Random Forest Optimization using SMOTE to Classify Air Quality,” Advance Sustainable Science, Engineering and Technology, vol. 6, no. 1, pp. 1–8, Nov. 2024, doi: 10.26877/asset.v6i1.18136.

[10] W. Satria and M. Riasetiawan, “Essay Answer Classification with SMOTE Random Forest And Adaboost In Automated Essay Scoring,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 4, p. 359, Oct. 2023, doi: 10.22146/ijccs.82548.

[11] A. Tholib, M. N. Fadli Hidayat, S. Yono, R. Wulanningrum, and E. Daniati, “Comparison of C4.5 and Naive Bayes for Predicting Student Graduation Using Machine Learning Algorithms,” International Journal of Engineering and Computer Science Applications (IJECSA), vol. 2, no. 2, pp. 65–72, Sep. 2023, doi: 10.30812/ijecsa.v2i2.3364.

[12] M. Arifin, F. Helmi, and R. B. Hikmawansyah, “Analisis Metode dan Algoritma Dalam Sistem Pendukung Keputusan untuk Memprediksi Kelulusan,” Jurnal Advance Research Informatika, vol. 3, no. 1, p. 73, Dec. 2024, doi: 10.24929/jars.v3i1.4045.

[13] M. Putra and Erwin Harahap, “Machine Learning pada Prediksi Kelulusan Mahasiswa Menggunakan Algoritma Random Forest,” Jurnal Riset Matematika, vol. 4, no. 2, pp. 127–136, Dec. 2024, doi: 10.29313/jrm.v4i2.5102.

[14] B. N. Aisyah and I. Gunawan, “Penerapan Machine Learning Untuk Memprediksi Kelulusan Mahaiswa Menggunakan Algoritma Decision Tree,” in Seminar Nasional Hasil Penelitian & Pengabdian Masyarakat Bidang Ilmu Komputer, 2024, pp. 1–6.

[15] P. P. Putra, M. K. Anam, S. Defit, and A. Yunianta, “Enhancing the Decision Tree Algorithm to Improve Performance Across Various Datasets,” INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi, vol. 8, no. 2, pp. 200–212, Aug. 2024, doi: 10.29407/intensif.v8i2.22280.

[16] M. Vedaraj, C. S. Anita, A. Muralidhar, V. Lavanya, K. Balasaranya, and P. Jagadeesan, “Early Prediction of Lung Cancer Using Gaussian Naive Bayes Classification Algorithm,” International Journal of Intelligent Systems and Applications in Engineering IJISAE, vol. 11, no. 6s, pp. 838–848, 2023, [Online]. Available: www.ijisae.org

[17] M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and Agustin, “Penerapan Naïve Bayes Classifier , K-Nearest Neighbor dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen dan Pemeritah,” Matrik: Jurnal Manajemen, Teknik Informatika, dan Rekayasa Komputer, vol. 21, no. 1, pp. 139–150, 2021, doi: 10.30812/matrik.v21i1.1092.

[18] M. K. Anam, S. Sumijan, K. Karfindo, and M. B. Firdaus, “Comparison Analysis of HSV Method, CNN Algorithm, and SVM Algorithm in Detecting the Ripeness of Mangosteen Fruit Images,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 7, no. 2, pp. 348–356, May 2024, doi: 10.24014/ijaidm.v7i2.29739.

[19] S. Abdillah, G. J. Yanris, and V. Sihombing, “Implementation Of The Support Vector Machine Method In Predicting Student Graduation,” International Journal of Science, vol. 6, no. 1, pp. 263–270, Jan. 2025, doi: 10.46729/ijstm.v6i1.1265.

[20] Q. Widayati, K. Adi, R. R. Isnanto, E. P. Agustini, D. R. R. Julianto, and F. B. Prakasa, “Predicting Student Loyalty in Higher Education Using Machine Learning: A Random Forest Approach,” Journal of Information Systems and Informatics, vol. 7, no. 1, pp. 63–77, Mar. 2025, doi: 10.51519/journalisi.v7i1.977.

[21] D. A. Rachmawati, N. A. Ibadurrahman, J. Zeniarja, and N. Hendriyanto, “Implementation of The Random Forest Algorithm In Classifying The Accuracy of Graduation Time For Computer Engineering Students At Dian Nuswantoro University,” Jurnal Teknik Informatika (Jutif), vol. 4, no. 3, pp. 565–572, Jun. 2023, doi: 10.52436/1.jutif.2023.4.3.920.

[22] S. Mehta, “Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes,” International Transactions on Artificial Intelligence (ITALIC), vol. 2, no. 1, pp. 60–75, 2023, doi: 10.33050/italic.v2i1.405.

[23] H. Mukhtar, J. Al Amien, and F. Dewi, “Prediction of Student Graduation Using Decision Tree Method,” in CELSciTech, Pekanbaru, 2021, pp. 7–18.

[24] A. Santoso, H. Retnawati, Kartianom, E. Apino, I. Rafi, and M. N. Rosyada, “Predicting Time to Graduation of Open University Students: An Educational Data Mining Study,” Open Education Studies, vol. 6, no. 1, pp. 1–14, Jan. 2024, doi: 10.1515/edu-2022-0220.

[25] T. A. Assegie, “An optimized K-Nearest neighbor based breast cancer detection,” Journal of Robotics and Control (JRC), vol. 2, no. 3, pp. 115–118, May 2021, doi: 10.18196/jrc.2363.

[26] T. R. Mahesh et al., “Transformative Breast Cancer Diagnosis using CNNs with Optimized ReduceLROnPlateau and Early Stopping Enhancements,” International Journal of Computational Intelligence Systems, vol. 17, no. 1, pp. 1–8, Dec. 2024, doi: 10.1007/s44196-023-00397-1.

[27] N. B. Aji, Kurnianingsih, N. Masuyama, and Y. Nojima, “CNN-LSTM for Heartbeat Sound Classification,” International Journal on Informatics Visualization, vol. 8, no. 2, pp. 735–741, 2024, doi: 10.62527/joiv.8.2.2115.

[28] M. K. Anam et al., “Sara Detection on Social Media Using Deep Learning Algorithm Development,” Journal of Applied Engineering and Technological Science, vol. 6, no. 1, pp. 225–237, Dec. 2024, doi: 10.37385/jaets.v6i1.5390.

[29] I. F. Ashari, M. C. Untoro, E. M. Sutowo, D. Salsabila, and D. A. Zhabiyan, “Hyperparameter Tuning Feature Selection with Genetic Algorithm and Gaussian Naïve Bayes for Diabetes Disease Prediction,” Jurnal Telematika, vol. 17, no. 1, pp. 17–23, 2022, doi: 10.61769/jurtel.v17i1.488.

[30] A. Masitha and M. Kunta Biddinika, “Preparing Dual Data Normalization for KNN Classfication in Prediction of Heart Failure,” KLIK, vol. 4, no. 3, pp. 1227–1234, 2023, doi: 10.30865/klik.v4i3.1382.

[31] Y. A. Singgalen, “Performance Evaluation of Sentiment Classification Models: A Comparative Study of NBC, SVM, and DT with SMOTE,” Media Online, vol. 4, no. 5, pp. 2539–2548, Apr. 2024, doi: 10.30865/klik.v4i5.1827.

[32] M. K. Anam, T. P. Lestari, H. Yenni, T. Nasution, and M. B. Firdaus, “Enhancement of Machine Learning Algorithm in Fine-grained Sentiment Analysis Using the Ensemble,” ECTI Transactions on Computer and Information Technology (ECTI-CIT), vol. 19, no. 2, pp. 159–167, Mar. 2025, doi: 10.37936/ecti-cit.2025192.257815.

[33] M. K. Anam et al., “Enhancing the Performance of Machine Learning Algorithm for Intent Sentiment Analysis on Village Fund Topic,” Journal of Applied Data Sciences, vol. 6, no. 2, pp. 1102–1115, 2025, doi: 10.47738/jads.v6i2.637.

[34] Y. Yati, Moh. A. Yaqin, and A. Y. Nadhiroh, “Application of The Support Vector Machine Algorithm for Timely Student Graduation Prediction Based on Streamlit Web at The Faculty of Informatics Engineering Nurul Jadid University,” Journal of Computer Networks, Architecture and High Performance Computing, vol. 6, no. 3, pp. 1066–1070, Jul. 2024, doi: 10.47709/cnahpc.v6i3.3918.

[35] H. Yuliansyah, R. A. P. Imaniati, A. Wirasto, and M. Wibowo, “Predicting Students Graduate on Time Using C4.5 Algorithm,” Journal of Information Systems Engineering and Business Intelligence, vol. 7, no. 1, p. 67, Apr. 2021, doi: 10.20473/jisebi.7.1.67-73.

[36] S. Hidayatulloh, G. Triyono, and K. A. S. kosasih, “Prediction Model For Students’ On-Time Graduation Using Algorithm Support Vector Machine (SVM) Based Particle Swarm Optimization (PSO),” JSiI (Jurnal Sistem Informasi), vol. 12, no. 1, pp. 25–32, Mar. 2025, doi: 10.30656/jsii.v11i1.6936.

[37] N. A. Prahastiwi, R. Andreswari, and R. Fauzi, “Students Graduation Prediction Based on Academic Data Record Using The Decision Tree Algorithm C4.5 Method,” JURTEKSI (Jurnal Teknologi dan Sistem Informasi), vol. 8, no. 3, pp. 295–304, Aug. 2022, doi: 10.33330/jurteksi.v8i3.1680.

[38] F. Rahman and Mustikasari, “Optimalisasi Prediksi Kelulusan Mahasiswa Tepat Waktu Menggunakan Binning dan Synthetic Minority Oversampling Technique (SMOTE),” vol. 4, no. 1, pp. 30–36, 2024, doi: 10.24252/jagti.v4i1.77.

[39] S. Mehta, “Playing Smart with Numbers: Predicting Student Graduation Using the Magic of Naive Bayes,” International Transactions on Artificial Intelligence, vol. 2, no. 1, pp. 60–75, Oct. 2023, doi: 10.33050/italic.v2i1.405.

Downloads

Published

2025-10-30

How to Cite

Addressing Class Imbalance in Machine Learning for Predicting On-Time Student Graduation at The Islamic University of Riau. (2025). JURNAL TEKNIK INFORMATIKA, 18(2), 226-235. https://doi.org/10.15408/jti.v18i2.45913