Handling Class Imbalance in Fan Sentiment Analysis: Naïve Bayes with TF-IDF on Instagram and Twitter

Khomsatun Nimah; Rakha Arian Archaniga

doi:10.15408/jti.v19i1.46733

Authors

Khomsatun Nimah Informatics Engineering Study Program, Departement of Information Technology, Jember State Polytechnic
Rakha Arian Archaniga Informatics Engineering Study Program, Departement of Information Technology, Jember State Polytechnic

DOI:

https://doi.org/10.15408/jti.v19i1.46733

Keywords:

Sentiment Analysis, Naïve Bayes, Accuracy

Abstract

Social media platforms such as Instagram and Twitter serve as major channels for football fans to share opinions and respond to club-related dynamics, including Manchester United. Beyond fan interaction, these platforms play an important role in business, marketing, and information exchange, making efficient text classification essential. This study applies the Naïve Bayes to analyze sentiment toward Manchester United’s performance based on 2,500 Instagram comments and 2,500 Twitter comments. The research process included data cleaning, sentiment labeling, and preprocessing steps. An imbalance in positive, negative, and neutral comments was managed using data balancing techniques to enhance model reliability. Results show that balancing significantly improved performance, with accuracy reaching 83.87% for Instagram and 82.48% for Twitter. Improvements in precision, recall, and F1-score further confirmed Naïve Bayes’ capability to handle complex, noisy, and diverse social media language. The study highlights how dataset size, effective preprocessing, and accurate labeling contributed to performance gains. Overall, Naïve Bayes proved effective for sentiment classification, offering insights into public perception of Manchester United. These findings emphasize its potential for large-scale social media analysis, supporting both academic research and practical applications in digital marketing and fan engagement strategies.

References

[1] E. Romero-Jara, F. Solanellas, S. López-Carril, D. Kolyperas, and C. Anagnostopoulos, “The more we post, the better? A comparative analysis of fan engagement on social media profiles of football leagues,” Int. J. Sport. Mark. Spons., 2024, doi: 10.1108/IJSMS-12-2023-0252.

[2] Nurmaelinda and Ibnu Rusydi, “Sosial Media Sebagai Standar Interaksi/Hubungan Bisnis Pada Era Digital Di Indonesia,” Demagogi J. Soc. Sci. Econ. Educ., vol. 1, no. 1, pp. 1–10, 2023, doi: 10.61166/demagogi.v1i1.1.

[3] N. R. A. Lubis, “Informasi Berbasis Media Sosial Pada Perpustakaan Digital,” J. Pari, vol. 8, no. 1, p. 53, 2022, doi: 10.15578/jp.v8i1.11517.

[4] D. Surya Sayogo, B. Irawan, and A. Bahtiar, “Analisis Sentimen Ulasan Instagram Di Google Play Store Menggunakan Algoritma Naïve Bayes,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 6, pp. 3314–3319, 2024, doi: 10.36040/jati.v7i6.8178.

[5] Syarli and A. A. Muin, “Metode Naive Bayes Untuk Prediksi Kelulusan,” J. Ilm. Ilmu Komput., vol. 2, no. 1, pp. 22–26, 2020, [Online]. Available: https://media.neliti.com/media/publications/283828-metode-naive-bayes-untuk-prediksi-kelulu-139fcfea.pdf

[6] S. R. Cholil, T. Handayani, R. Prathivi, and T. Ardianita, “Implementasi Algoritma Klasifikasi K-Nearest Neighbor (KNN) Untuk Klasifikasi Seleksi Penerima Beasiswa,” IJCIT (Indonesian J. Comput. Inf. Technol., vol. 6, no. 2, pp. 118–127, 2021, doi: 10.31294/ijcit.v6i2.10438.

[7] M. I. Fikri, T. S. Sabrila, and Y. Azhar, “Perbandingan Metode Naïve Bayes dan Support Vector Machine pada Analisis Sentimen Twitter,” Smatika J., vol. 10, no. 02, pp. 71–76, 2020, doi: 10.32664/smatika.v10i02.455.

[8] E. D. Tarkus, S. R. U. . Sompie, and A. Jacobus, “Implementasi Metode Recurrent Neural Network pada Pengklasifikasian Kualitas Telur Puyuh ,” J. Tek. Inform., vol. 15, no. 2, pp. 137–144, 2020, [Online]. Available: https://ejournal.unsrat.ac.id/v3/index.php/informatika/article/view/29552

[9] Y. Kurnia, E. D. Kusuma, L. W. Kusuma, Suwitno, and W. Apridius, “Perbandingan Naïve Bayes dan CNN yang Dioptimasi PSO pada Identifikasi Berita Hoax Politik Indonesia,” bit-Tech, vol. 6, no. 3, pp. 340–352, 2024, doi: 10.32877/bt.v6i3.1225.

[10] F. S. Abiyoga Bagus Mustriyanto, Muhammad Habibi, Dayat Subekti, “Perbandingan Metode Decision Tree Dan Naive Bayes Classifier Pada Analisis Sentimen Pengguna Layanan Pt Perusahaan Listrik Negara (Pln),” Teknomatika J. Inform. dan Komput., vol. 15, no. 2, pp. 53–61, 2022, doi: 10.30989/teknomatika.v15i2.1131.

[11] Q. A. Puteri, T. Sagirani, and J. Lemantara, “Perbandingan Algoritma Naïve Bayes dan K-Nearest Neighbor (KNN) untuk Mengetahui Keakuratan Diagnosa Penyakit Diabetes,” J. Nas. Teknol. dan Sist. Inf., vol. 9, no. 3, pp. 247–254, 2023, doi: 10.25077/teknosi.v9i3.2023.247-254.

[12] M. Z. Haq, C. S. Octiva, A. Ayuliana, U. W. Nuryanto, and D. Suryadi, “Algoritma Naïve Bayes untuk Mengidentifikasi Hoaks di Media Sosial,” J. Minfo Polgan, vol. 13, no. 1, pp. 1079–1084, 2024, doi: 10.33395/jmp.v13i1.13937.

[13] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” J. Media Inform. Budidarma, vol. 5, no. 2, p. 406, 2021, doi: 10.30865/mib.v5i2.2835.

[14] F. Panjaitan, “Perbandingan Penggunaan Tfidfvectorizer, Countvectorizer, Dan Hashingvectorizer Dengan Optimalisasi Parameter Pada Machine Learning Untuk Analisis Sentimen Pemilu 2024,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 4, pp. 7413–7419, 2024, doi: 10.36040/jati.v8i4.10288.

[15] I. Siti Aisah, B. Irawan, and T. Suprapti, “Algoritma Support Vector Machine (Svm) Untuk Analisis Sentimen Ulasan Aplikasi Al Qur’an Digital,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 6, pp. 3759–3765, 2024, doi: 10.36040/jati.v7i6.8263.

[16] A. Candra Dewi, “Bahasa dalam Media Sosial: Kajian Linguistik Digital terhadap Gaya Bahasa Generasi Milenial dan Gen Z,” J. Kaji. Pendidik. dan Cakrawala Pembelajaran, vol. 1, pp. 57–67, 2025.

[17] I. Wickramasinghe and H. Kalutarage, “Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation,” Soft Comput., vol. 25, no. 3, pp. 2277–2293, 2021, doi: 10.1007/s00500-020-05297-6.

[18] R. Kumar, B. Krishna Goswami, S. Motiram Mhatre, and S. Agrawal, “Naive Bayes in Focus: A Thorough Examination of its Algorithmic Foundations and Use Cases,” Int. J. Innov. Sci. Res. Technol., vol. 9, no. 5, pp. 2078–2081, 2024, doi: 10.38124/ijisrt/ijisrt24may1438.

[19] R. Strimaitis, P. Stefanovic, S. Ramanauskaite, and A. Slotkiene, “A Combined Approach for Multi-Label Text Data Classification,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/3369703.

[20] N. M. Al Ghazali and Y. Sibaroni, “Sentiment Classification in E-Commerce Using Naïve Bayes and Combined Lexicon - N-Gram Features,” JIPI (Jurnal Ilm. Penelit. dan Pembelajaran Inform., vol. 10, no. 2, pp. 1257–1271, 2025, doi: 10.29100/jipi.v10i2.6157.

[21] Y. Mao, Q. Liu, and Y. Zhang, “Sentiment analysis methods, applications, and challenges: A systematic literature review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 36, no. 4, p. 102048, 2024, doi: 10.1016/j.jksuci.2024.102048.

[22] Q. A. Xu, V. Chang, and C. Jayne, “A systematic review of social media-based sentiment analysis: Emerging trends and challenges,” Decis. Anal. J., vol. 3, no. June, p. 100073, 2022, doi: 10.1016/j.dajour.2022.100073.

[23] C. N. Dang, M. N. Moreno-García, and F. De La Prieta, “Hybrid Deep Learning Models for Sentiment Analysis,” Complexity, vol. 2021, 2021, doi: 10.1155/2021/9986920.

[24] A. Abdul Aziz and A. Starkey, “Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches,” IEEE Access, vol. 8, pp. 17722–17733, 2020, doi: 10.1109/ACCESS.2019.2958702.

[25] K. L. Tan, C. P. Lee, and K. M. Lim, “A Survey of Sentiment Analysis: Approaches, Datasets, and Future Research,” Appl. Sci., vol. 13, no. 7, 2023, doi: 10.3390/app13074550.

[26] N. S. P. Juana, E. Haerani, F. Syafria, and E. Budianita, “Analisis Sentimen Tanggapan Masyarakat Terhadap Calon Presiden 2024 Ridwan Kamil Menggunakan Metode Naive Bayes Classifier,” J. Sist. Komput. dan Inform., vol. 4, no. 4, p. 570, 2023, doi: 10.30865/json.v4i4.6168.

[27] Fajar Muharram and Kana Saputra S, “Analisis Sentimen Pengguna Twitter Terhadap Kinerja Walikota Medan Menggunakan Metode Naive Bayes Classifier,” J. Sist. Inf. dan Ilmu Komput., vol. 1, no. 2, pp. 01–12, 2023, doi: 10.59581/jusiik-widyakarya.v1i2.17.

[28] S. Butsianto, S. Fauziah, C. Naya, and F. Maulana, “Sentiment Analysis Of Indosat’s Mobile Operator Services On Twitter Using The Naïve Bayes Algorithm,” Brill. Res. Artif. Intell., vol. 4, no. 1, pp. 245–254, 2024, doi: 10.47709/brilliance.v4i1.4084.

[29] P. Anggraini, S. Informasi, U. Nasional, J. S. Manila, P. Minggu, and J. Selatan, “KOMPARASI NAÏVE BAYES , SUPPORT VECTOR MACHINE , DAN RANDOM FOREST DALAM ANALISIS SENTIMEN,” vol. 9, no. 3, pp. 4451–4457, 2025.

[30] C. Zachlod, O. Samuel, A. Ochsner, and S. Werthmüller, “Analytics of social media data – State of characteristics and application,” J. Bus. Res., vol. 144, no. May 2021, pp. 1064–1076, 2022, doi: 10.1016/j.jbusres.2022.02.016.

[31] S. Bazzaz Abkenar, M. Haghi Kashani, E. Mahdipour, and S. M. Jameii, “Big data analytics meets social media: A systematic review of techniques, open issues, and future directions,” Telemat. Informatics, vol. 57, no. June 2020, p. 101517, 2021, doi: 10.1016/j.tele.2020.101517.

[32] V. Chang, S. Sajeev, Q. A. Xu, M. Tan, and H. Wang, “Football Analytics: Assessing the Correlation between Workload, Injury and Performance of Football Players in the English Premier League,” Appl. Sci., vol. 14, no. 16, 2024, doi: 10.3390/app14167217.

[33] J. F. Andry, S. Riama, and V. N. Yefta, “Analysis of Big Data Football Club Market Value Using K-Means and Linear Regression Mining Methods,” J. Comput. Sci., vol. 19, no. 2, pp. 286–294, 2023, doi: 10.3844/JCSSP.2023.286.294.

[34] M. A. Oladipupo, P. C. Obuzor, B. J. Bamgbade, K. M. Olagunju, A. E. Adeniyi, and S. A. Ajagbe, “An Automated Python Script for Data Cleaning and Labeling using Machine Learning Technique,” Inform., vol. 47, no. 6, pp. 219–232, 2023, doi: 10.31449/inf.v47i6.4474.

[35] X. Wu, W. Zheng, X. Xia, and D. Lo, “Data Quality Matters: A Case Study on Data Label Correctness for Security Bug Report Prediction,” IEEE Trans. Softw. Eng., vol. 48, no. 7, pp. 2541–2556, 2022, doi: 10.1109/TSE.2021.3063727.

[36] S. Tabinda Kokab, S. Asghar, and S. Naz, “Transformer-based deep learning models for the sentiment analysis of social media data,” Array, vol. 14, no. April, p. 100157, 2022, doi: 10.1016/j.array.2022.100157.

[37] W. Aljedaani et al., “Sentiment analysis on Twitter data integrating TextBlob and deep learning models: The case of US airline industry,” Knowledge-Based Syst., vol. 255, p. 109780, 2022, doi: 10.1016/j.knosys.2022.109780.

[38] G. Mutanov, V. Karyukin, and Z. Mamykova, “Multi-class sentiment analysis of social media data with machine learning algorithms,” Comput. Mater. Contin., vol. 69, no. 1, pp. 913–930, 2021, doi: 10.32604/cmc.2021.017827.

[39] R. D. Fitriani, H. Yasin, and T. Tarno, “PENANGANAN KLASIFIKASI KELAS DATA TIDAK SEIMBANG DENGAN RANDOM OVERSAMPLING PADA NAIVE BAYES (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal),” J. Gaussian, vol. 10, no. 1, pp. 11–20, 2021, doi: 10.14710/j.gauss.v10i1.30243.

[40] T. D. Piyadasa and K. Gunawardana, “A Review on Oversampling Techniques for Solving the Data Imbalance Problem in Classification,” Int. J. Adv. ICT Emerg. Reg., vol. 16, no. 1, pp. 22–31, 2023, doi: 10.4038/icter.v16i1.7260.

[41] A. A. Syam, G. H. M, A. Salim, D. F. Surianto, and M. F. B, “Analisis teknik preprocessing pada sentimen masyarakat terkait konflik israel-palestina menggunakan support vector machine,” vol. 9, no. 3, pp. 1464–1472, 2024.

[42] T. H. Saputro and A. Hermawan, “The Accuracy Improvement of Text Mining Classification on Hospital Review through The Alteration in The Preprocessing Stage,” Int. J. Comput. Inf. Technol., vol. 10, no. 4, pp. 140–146, 2021, doi: 10.24203/ijcit.v10i4.138.

[43] N. Kosala and V. Nirmalrani, “Influence of Pre-Processing Strategies on Sentiment Analysis Performance: Leveraging Bert, TF-IDF and Glove Features,” J. Mach. Comput., vol. 5, no. 1, pp. 464–473, 2025, doi: 10.53759/7669/jmc202505036.

[44] H. Bichri, A. Chergui, and M. Hain, “Investigating the Impact of Train / Test Split Ratio on the Performance of Pre-Trained Models with Custom Datasets,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 2, pp. 331–339, 2024, doi: 10.14569/IJACSA.2024.0150235.

[45] V. R. Joseph, “Optimal ratio for data splitting,” Stat. Anal. Data Min., vol. 15, no. 4, pp. 531–538, 2022, doi: 10.1002/sam.11583.

[46] R. Ramadhan, Y. A. Sari, and P. P. Adikara, “Perbandingan Pembobotan Term Frequency-Inverse Document Frequency dan Term Frequency-Relevance Frequency terhadap Fitur N-Gram pada Analisis Sentimen,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 11, pp. 5075–5079, 2021, [Online]. Available: http://j-ptiik.ub.ac.id

[47] I. Verawati and B. S. Audit, “Algoritma Naïve Bayes Classifier Untuk Analisis Sentiment Pengguna Twitter Terhadap Provider By.u,” J. Media Inform. Budidarma, vol. 6, no. 3, p. 1411, 2022, doi: 10.30865/mib.v6i3.4132.

[48] D. Septiani and I. Isabela, “Analisis Term Frequency Inverse Document Frequency (Tf-Idf) Dalam Temu Kembali Informasi Pada Dokumen Teks,” SINTESIA J. Sist. dan Teknol. Inf. Indones., vol. 1, no. 2, pp. 81–88, 2022.

[49] N. Chatrina Siregar, R. Ruli, A. Siregar, ; M Yoga, and D. Sudirman, “Implementasi Metode Naive Bayes Classifier (NBC) Pada Komentar Warga Sekolah Mengenai Pelaksanaan Pembelajaran Jarak Jauh (PJJ),” J. Teknol. Aliansi Perguru. Tinggi BUMN, vol. 3, no. 1, pp. 102–110, 2020.

[50] A. Z. Macfud, A. P. Kusuma, and W. D. Puspitasari, “Analisis Algoritma Naive Bayes Classifier ( Nbc ),” vol. 7, no. 1, pp. 87–94, 2023.

[51] Baiq Nurul Azmi, Arief Hermawan, and Donny Avianto, “Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver,” JTIM J. Teknol. Inf. dan Multimed., vol. 4, no. 4, pp. 281–290, 2023, doi: 10.35746/jtim.v4i4.298.

[52] S. Clara, D. Laksmi Prianto, R. Al Habsi, E. Friscila Lumbantobing, and N. Chamidah, “Implementasi Seleksi Fitur Pada Algoritma Klasifikasi Machine Learning Untuk Prediksi Penghasilan Pada Adult Income Dataset,” Semin. Nas. Mhs. Ilmu Komput. dan Apl. Jakarta-Indonesia, vol. 2, no. 1, pp. 741–747, 2021.

[53] M. Mujahid et al., “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00943-4.

[54] R. Syahputra, G. J. Yanris, and D. Irmayani, “SVM and Naïve Bayes Algorithm Comparison for User Sentiment Analysis on Twitter,” Sinkron, vol. 7, no. 2, pp. 671–678, 2022, doi: 10.33395/sinkron.v7i2.11430.

[55] K. Puh and M. Bagić Babac, “Predicting sentiment and rating of tourist reviews using machine learning,” J. Hosp. Tour. Insights, vol. 6, no. 3, pp. 1188–1204, 2023, doi: 10.1108/JHTI-02-2022-0078.

[56] W. B. Zulfikar, A. R. Atmadja, and S. F. Pratama, “Sentiment Analysis on Social Media Against Public Policy Using Multinomial Naive Bayes,” Sci. J. Informatics, vol. 10, no. 1, pp. 25–34, 2023, doi: 10.15294/sji.v10i1.39952.

[57] Puti Utari Maharani, Nonong Amalita, Atus Amadi Putra, and Fadhilah Fitri, “Sentiment Analysis og Goride Services on Twitter Social Media Using Naive Bayes Algorithm,” UNP J. Stat. Data Sci., vol. 1, no. 3, pp. 134–139, 2023, doi: 10.24036/ujsds/vol1-iss3/41.

[58] F. Y. Dharta, A. Januar Mahardhani, S. Rachmawati Yahya, A. Dirsa, and E. M. Usulu, “Application of Naive Bayes Classifier Method to Analyze Social Media User Sentiment Towards the Presidential Election Phase,” J. Inf. dan Teknol., vol. 6, pp. 176–181, 2024, doi: 10.60083/jidt.v6i1.494.

[59] Ramdhan Hakiki, A. Pambudi, and Asriyanik, “Classification of Public Sentiment Toward 2024 Presidential Candidates on Social Media Platform X Using Naïve Bayes Algorithm,” J. Artif. Intell. Eng. Appl., vol. 3, no. 2, pp. 551–556, 2024, doi: 10.59934/jaiea.v3i2.422.

[60] Martiti and C. Juliane, “Implementation of Naive Bayes Algorithm on Sentiment Analysis Application,” Proc. 2nd Int. Semin. Sci. Appl. Technol. (ISSAT 2021), vol. 207, no. Issat, pp. 193–200, 2021, doi: 10.2991/aer.k.211106.030.

[61] M. Tika Adilah, H. Supendar, R. Ningsih, S. Muryani, and K. Solecha, “Sentiment Analysis of Online Transportation Service using the Naïve Bayes Methods,” J. Phys. Conf. Ser., vol. 1641, no. 1, 2020, doi: 10.1088/1742-6596/1641/1/012093.

[62] Syahriani, A. A. Yana, and T. Santoso, “Sentiment analysis of facebook comments on indonesian presidential candidates using the naïve bayes method,” J. Phys. Conf. Ser., vol. 1641, no. 1, 2020, doi: 10.1088/1742-6596/1641/1/012012.

[63] R. L. Mustofa and B. Prasetiyo, “Sentiment analysis using lexicon-based method with naive bayes classifier algorithm on #newnormal hashtag in twitter,” J. Phys. Conf. Ser., vol. 1918, no. 4, 2021, doi: 10.1088/1742-6596/1918/4/042155.

[64] A. Erfina and M. R. N. R. Alamsyah, “Implementation of Naive Bayes classification algorithm for Twitter user sentiment analysis on ChatGPT using Python programming language,” Data Metadata, vol. 2, pp. 2–11, 2023, doi: 10.56294/dm202345.

[65] F. Abei, A. A. Sulaeman, and S. Suprapto, “Twitter Sentiment Towards 2024 Jakarta Governor Candidates With Naïve Bayes Algorithm,” J. Comput. Networks, Archit. High Perform. Comput., vol. 7, no. 1, pp. 265–277, 2025, doi: 10.47709/cnahpc.v7i1.5358.

[66] A. Basuki, “Sentiment Analysis of Customers’ Review on Delivery Service Provider on Twitter Using Naive Bayes Classification,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 2, pp. 420–428, 2023, doi: 10.26555/jiteki.v9i2.26327.

[67] N. Umar and M. A. Nur, “Application of Naïve Bayes Algorithm Variations On Indonesian General Analysis Dataset for Sentiment Analysis,” J. RESTI, vol. 6, no. 4, pp. 585–590, 2022, doi: 10.29207/resti.v6i4.4179.

[68] A. R. Isnain, N. S. Marga, and D. Alita, “Sentiment Analysis Of Government Policy On Corona Case Using Naive Bayes Algorithm,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 1, p. 55, 2021, doi: 10.22146/ijccs.60718.

[69] Y. Luo, X. Yang, C. Ouyang, Y. Wan, and S. He, “Merging Naive Bayes and Causal Rules for Text Sentiment Analysis,” J. Phys. Conf. Ser., vol. 1757, no. 1, 2021, doi: 10.1088/1742-6596/1757/1/012034.

[70] R. Novendri, A. S. Callista, D. N. Pratama, and C. E. Puspita, “Sentiment Analysis of YouTube Movie Trailer Comments Using Naïve Bayes,” Bull. Comput. Sci. Electr. Eng., vol. 1, no. 1, pp. 26–32, 2020, doi: 10.25008/bcsee.v1i1.5.

[71] Athhar Hafizha Luthfi, Ahmad Faqih, and Gifthera Dwilestari, “Accuracy in Sentiment Analysis of the by.U Application Using Naïve Bayes and SMOTE Techniques,” J. Artif. Intell. Eng. Appl., vol. 4, no. 2, pp. 708–719, 2025, doi: 10.59934/jaiea.v4i2.737.

[72] M. A. A. Putra, Suwarno, and R. A. Prasojo, “Improving Transformer Health Index Prediction Performance Using Machine Learning Algorithms with a Synthetic Minority Oversampling Technique,” Energies, vol. 18, no. 9, 2025, doi: 10.3390/en18092364.

[73] H. Chen, S. Hu, R. Hua, and X. Zhao, “Improved naive Bayes classification algorithm for traffic risk management,” EURASIP J. Adv. Signal Process., vol. 2021, no. 1, 2021, doi: 10.1186/s13634-021-00742-6.

[74] R. Blanquero, E. Carrizosa, P. Ramírez-Cobo, and M. R. Sillero-Denamiel, “Variable selection for Naïve Bayes classification,” Comput. Oper. Res., vol. 135, p. 105456, 2021, doi: 10.1016/j.cor.2021.105456.

[75] P. Fränti and R. Mariescu-Istodor, “Soft precision and recall,” Pattern Recognit. Lett., vol. 167, pp. 115–121, 2023, doi: 10.1016/j.patrec.2023.02.005.

[76] J. Pardede and Dika Prasetia Pamungkas, “The Impact of Balanced Data Techniques on Classification Model Performance,” Sci. J. Informatics, vol. 11, no. 2, pp. 401–412, 2024, doi: 10.15294/sji.v11i2.3649.

[77] R. O. Enihe, R. Prasad, F. N. Ogwueleka, and F. B. Abdullahi, “The effect of imbalance data mitigation techniques on cardiovascular disease prediction,” J. Niger. Soc. Phys. Sci., vol. 7, no. 2, pp. 1–16, 2025, doi: 10.46481/jnsps.2025.2385.

[78] S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, “Comparative Analysis using Various Performance Metrics in Imbalanced Data for Multi-class Text Classification,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, pp. 1082–1090, 2023, doi: 10.14569/IJACSA.2023.01406116.

[79] H. Chen, N. Wang, X. Du, K. Mei, Y. Zhou, and G. Cai, “Classification Prediction of Breast Cancer Based on Machine Learning,” Comput. Intell. Neurosci., vol. 2023, no. 1, 2023, doi: 10.1155/2023/6530719.

[80] D. J. Hand, P. Christen, and N. Kirielle, “F*: an interpretable transformation of the F-measure,” Mach. Learn., vol. 110, no. 3, pp. 451–456, 2021, doi: 10.1007/s10994-021-05964-1.

[81] K. Takahashi, K. Yamamoto, A. Kuchiba, and T. Koyama, “Confidence interval for micro-averaged F 1 and macro-averaged F 1 scores,” Appl. Intell., vol. 52, no. 5, pp. 4961–4972, 2022, doi: 10.1007/s10489-021-02635-5.