Detecting Hoax News in Indonesian Language Using Web-Based Multinomial Naïve Bayes

Fitri Mintarsih; Ivan Ananda Putra; Arini; Victor Amrizal; Hendra Bayu Suseno

doi:10.15408/jti.v19i1.50385

Authors

Fitri Mintarsih Department of Informatics Faculty of Science and Technology, Syarif Hidayatullah State Islamic University Jakarta
Ivan Ananda Putra Department of Informatics Faculty of Science and Technology, Syarif Hidayatullah State Islamic University Jakarta
Arini Department of Informatics Faculty of Science and Technology, Syarif Hidayatullah State Islamic University Jakarta
Victor Amrizal Department of Informatics Faculty of Science and Technology, Syarif Hidayatullah State Islamic University Jakarta
Hendra Bayu Suseno Department of Informatics Faculty of Science and Technology, Syarif Hidayatullah State Islamic University Jakarta

DOI:

https://doi.org/10.15408/jti.v19i1.50385

Keywords:

Text Classification, Hoax News Detection, Multinomial Naïve Bayes, Flask Framework

Abstract

This study addresses the growing problem of hoax news in Indonesia, which has contributed to social conflicts. It aims to develop an effective detection method using the Multinomial Naive Bayes algorithm. The study integrates Indonesian specific text preprocessing and feature engineering within the CRISP-DM framework to enhance classification performance. A dataset of 5,226 news articles (2,612 non-hoax and 2,614 hoax) was collected from kompas.com and turnbackhoax.id. Preprocessing steps included case folding, tokenization, stopword removal, and stemming tailored to the Indonesian language. Feature extraction was performed using the TF-IDF weighting scheme to convert text into numerical representations. The Multinomial Naive Bayes algorithm achieved an average accuracy of 86%, precision of 86%, recall of 86%, and F1 score of 86%, indicating stable and balanced performance. Furthermore, the trained model was successfully deployed using the Flask framework and stored in (pickle/joblib) format, demonstrating its practical applicability in real world systems. The results indicate that the integration of Indonesian specific preprocessing and TF-IDF feature representation significantly supports the effectiveness of the Multinomial Naive Bayes algorithm in detecting hoax news. This study provides a scalable and implementable approach to combating the spread of false information in Indonesian digital media.

References

[1] C. Juditha, “Interaksi Komunikasi Hoax di Media Sosial serta Antisipasinya,” Jurnal Pekommas, vol. 3, no. 1, 2018.

[2] B. Zaman, A. Justitia, K. N. Sani, and E. Purwanti, “An Indonesian Hoax News Detection System Using Reader Feedback and Naïve Bayes Algorithm,” Cybernetics and Information Technologies, vol. 20, pp. 82–94, 2020, doi: 10.2478/cait-2020-0006.

[3] A. Rahman and A. Doewes, “Online News Classification Using Multinomial Naive Bayes,” Jurnal Ilmiah Teknologi dan Informasi, vol. 6, no. 1, pp. 32–38, 2017, [Online]. Available: www.kompas.com

[4] Y. Pan, H. Gao, H. Lin, Z. Liu, L. Tang, and S. Li, “Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree,” Int J Mol Sci, vol. 19, no. 6, p. 1779, Jun. 2018, doi: 10.3390/ijms19061779.

[5] N. Walter, J. Cohen, R. L. Holbert, and Y. Morag, “Fact-Checking: A Meta-Analysis of What Works and for Whom,” Polit Commun, vol. 37, no. 3, pp. 350–375, May 2020, doi: 10.1080/10584609.2019.1668894.

[6] S. R. S. Gowda, B. R. Archana, P. Shettigar, and K. K. Satyarthi, “Sentiment Analysis of Twitter Data Using Naïve Bayes Classifier,” 2022, pp. 1227–1234. doi: 10.1007/978-981-16-3690-5_117.

[7] S. Robertson, “Understanding inverse document frequency: On theoretical arguments for IDF,” Journal of Documentation, vol. 60, no. 5, pp. 503–520, 2004, doi: 10.1108/00220410410560582.

[8] A. S. Amirul Haj, V. Amrizal, and A. Arini, “Analisis Sentimen Kinerja KPU Pemilu 2019 Menggunakan Algoritma K-Means Dengan Algoritma Confix Stripping Stemmer,” Journal of Innovation Information Technology and Application (JINITA), vol. 2, no. 01, pp. 9–18, Jun. 2020, doi: 10.35970/jinita.v2i01.119.

[9] P. Singh, Deploy Machine Learning Models to Production. Berkeley, CA: Apress, 2021. doi: 10.1007/978-1-4842-6546-8.

[10] H. Sayyed, “How to create & run pickle file for machine learning model.” Accessed: Jul. 27, 2024. [Online]. Available: https://studygyaan.com/data-science/how-to-create-run-pickle-file-for-machine-learning-model

[11] R. Anggrainingsih, G. M. Hassan, and A. Datta, “Evaluating BERT-based language models for detecting misinformation,” Neural Computing and Applications, 2025.