Evaluating BiLSTM  Performance with BERT, RoBERTa, and DistilBERT in Online Bullying News Detection

Moh. Rosidi Zamroni; Miftahus Sholihin; Erna Hayati; Rahayu A Hamid; Nurul Aswa Omar

doi:10.15408/jti.v18i2.42459

Authors

Moh. Rosidi Zamroni Informatics Engineering, Faculty of Science and Technology, Lamongan Islamic University, Indonesia https://orcid.org/0000-0002-2462-6564
Miftahus Sholihin Informatics Engineering, Faculty of Science and Technology, Lamongan Islamic University, Indonesia https://orcid.org/0000-0002-9032-1267
Erna Hayati Accounting (Statistics), Faculty of Economics and Business, Lamongan Islamic University, Indonesia https://orcid.org/0000-0002-3504-3380
Rahayu A Hamid Informatics Engineering, Faculty of Science and Technology, Tun Hussein Onn University Malaysia https://orcid.org/0000-0002-0241-3947
Nurul Aswa Omar Informatics Engineering, Faculty of Science and Technology, Tun Hussein Onn University Malaysia https://orcid.org/0000-0002-6650-4506

DOI:

https://doi.org/10.15408/jti.v18i2.42459

Keywords:

Bullying, News Classification, Word Embedding, BiLSTM, NLP

Abstract

This study examines the performance of BiLSTM combined with three transformer-based word embeddings—BERT, RoBERTa, and DistilBERT—in classifying bullying news in online media. BiLSTM was chosen for its significant advantages in processing text sequences compared to traditional RNN and LSTM models. The study used a dataset of 2,800 articles from three major Indonesian news portals, with 2,000 articles for training and 800 for testing, labeled using the lexicon method. The testing results showed that the combination of BiLSTM and RoBERTa achieved the best performance, with an accuracy of 94% and a near-perfect precision of 99%. Statistical significance tests confirmed that BiLSTM with RoBERTa performs significantly better than with BERT or DistilBERT. These findings suggest that the BiLSTM and RoBERTa combination is the most effective for classifying bullying news, especially for new or unseen data. This research contributes to the development of automatic bullying content detection systems to enhance content moderation on news platforms.

References

[1] J. Song, K. Kim, Y. Han, and T. M. Song, “Classification of Bullying-Related Web Documents: An Ecological Systems and Machine Learning Approach,” Jan. 29, 2023, Social Science Research Network, Rochester, NY: 4341006. doi: 10.2139/ssrn.4341006.

[2] S. Salawu, Y. He, and J. Lumsden, “Approaches to Automated Detection of Cyberbullying: A Survey,” IEEE Transactions on Affective Computing, vol. 11, no. 1, pp. 3–24, Jan. 2020, doi: 10.1109/TAFFC.2017.2761757.

[3] F. A. Nirmala, M. Jazman, N. E. Rozanda, and F. N. Salisah, “CYBERBULLYING SENTIMENT ANALYSIS OF INSTAGRAM COMMENTS USING NAÏVE BAYES CLASSIFIER AND K-NEAREST NEIGHBOR ALGORITHM METHODS,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 5, Art. no. 5, May 2024, doi: 10.52436/1.jutif.2024.5.5.1997.

[4] B. I. Kusuma and A. Nugroho, “CYBERBULLYING DETECTION ON TWITTER USES THE SUPPORT VECTOR MACHINE METHOD,” Jurnal Teknik Informatika (Jutif), vol. 5, no. 1, Art. no. 1, Jan. 2024, doi: 10.52436/1.jutif.2024.5.1.809.

[5] M. F. Hibatulloh, D. N. Suci, A. G. Puspita, and I. F. Rohmah, “A Critical Discourse Analysis on Antara English News Reports About Bullying in Education Institutions in Indonesia,” 1, vol. 5, no. 3, Art. no. 3, Oct. 2023.

[6] A. D. Gower, T. Vaillancourt, H. Brittain, K. Pletta, and M. A. Moreno, “185. Understanding News Media Coverage On Bullying And Cyberbullying,” Journal of Adolescent Health, vol. 64, no. 2, p. S94, Feb. 2019, doi: 10.1016/j.jadohealth.2018.10.201.

[7] K. Nemkul, “Use of Bidirectional Encoder Representations from Transformers (BERT) and Robustly Optimized Bert Pretraining Approach (RoBERTa) for Nepali News Classification,” Tribhuvan University Journal, vol. 39, no. 1, Art. no. 1, Jun. 2024, doi: 10.3126/tuj.v39i1.66679.

[8] P. Singh and A. Jain, “A BERT-BiLSTM Approach for Socio-political News Detection,” in Proceedings of Fifth Doctoral Symposium on Computational Intelligence, A. Swaroop, V. Kansal, G. Fortino, and A. E. Hassanien, Eds., Singapore: Springer Nature, 2024, pp. 203–212. doi: 10.1007/978-981-97-6036-7_17.

[9] W. Shi, M. Song, and Y. Wang, “Perturbation-enhanced-based RoBERTa combined with BiLSTM model for Text classification,” in ICETIS 2022; 7th International Conference on Electronic Technology and Information Science, Jan. 2022, pp. 1–5. Accessed: Feb. 26, 2025. [Online]. Available: https://ieeexplore.ieee.org/document/9788645

[10] C. Y. Sy, L. L. Maceda, M. J. P. Canon, and N. M. Flores, “Beyond BERT: Exploring the Efficacy of RoBERTa and ALBERT in Supervised Multiclass Text Classification,” IJACSA, vol. 15, no. 3, 2024, doi: 10.14569/IJACSA.2024.0150323.

[11] Q. An, B. Pan, Z. Liu, S. Du, and Y. Cui, “Chinese Named Entity Recognition in Football Based on ALBERT-BiLSTM Model,” Applied Sciences, vol. 13, no. 19, Art. no. 19, Jan. 2023, doi: 10.3390/app131910814.

[12] “Python Based Machine Learning Text Classification - IOPscience.” Accessed: Mar. 04, 2025. [Online]. Available: https://iopscience.iop.org/article/10.1088/1742-6596/2394/1/012015

[13] E. Hayati, M. R. Zamroni, D. H. Prayitno, and E. Rachmawati, “Sentiment analysis on Indomie vs Gaga polemic using tiktok data,” International Management Conference and Progressive Papers, pp. 498–507, Nov. 2023.

[14] Y. Fauziah, B. Yuwono, and A. S. Aribowo, “Lexicon Based Sentiment Analysis in Indonesia Languages : A Systematic Literature Review,” RSF Conference Series: Engineering and Technology, vol. 1, no. 1, Art. no. 1, 2021, doi: 10.31098/cset.v1i1.397.

[15] F. T. Saputra, S. H. Wijaya, Y. Nurhadryani, and Defina, “Lexicon Addition Effect on Lexicon-Based of Indonesian Sentiment Analysis on Twitter,” in 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Nov. 2020, pp. 136–141. doi: 10.1109/ICIMCIS51567.2020.9354269.

[16] A. Aribowo and S. Khomsah, “Implementation Of Text Mining For Emotion Detection Using The Lexicon Method (Case Study: Tweets About Covid-19),” Telematika, vol. 18, p. 49, Mar. 2021, doi: 10.31315/telematika.v18i1.4341.

[17] R. Darman, “Analisis Sentimen Respons Twitter terhadap Persyaratan Badan Penyelenggara Jaminan Sosial (BPJS) di Kantor Pertanahan,” Widya Bhumi, vol. 3, no. 2, Art. no. 2, Oct. 2023, doi: 10.31292/wb.v3i2.61.

[18] Z. Li and Z. Zou, “Punctuation and lexicon aid representation: A hybrid model for short text sentiment analysis on social media platform,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 3, p. 102010, Mar. 2024, doi: 10.1016/j.jksuci.2024.102010.

[19] “Sentiment score-based classification for fake news using machine learning and LSTM-BiLSTM | Soft Computing.” Accessed: Nov. 10, 2024. [Online]. Available: https://link.springer.com/article/10.1007/s00500-024-09884-9

[20] “Combining Bi-LSTM And Word2vec Embedding For Sentiment Analysis Models Of Application User Reviews | The Indonesian Journal of Computer Science.” Accessed: Nov. 07, 2024. [Online]. Available: http://ijcs.net/ijcs/index.php/ijcs/article/view/3647

[21] G. Meera and Dr. R. Murugesan, “Improving sentiment analysis of financial news headlines using hybrid Word2Vec-TFIDF feature extraction technique,” Procedia Computer Science, vol. 244, pp. 1–8, Jan. 2024, doi: 10.1016/j.procs.2024.10.172.

[22] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” May 24, 2019, arXiv: arXiv:1810.04805. doi: 10.48550/arXiv.1810.04805.

[23] A. Rogers, O. Kovaleva, and A. Rumshisky, “A Primer in BERTology: What We Know About How BERT Works,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 842–866, Jan. 2021, doi: 10.1162/tacl_a_00349.

[24] M. Zaib, Q. Z. Sheng, and W. Emma Zhang, “A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP,” in Proceedings of the Australasian Computer Science Week Multiconference, in ACSW ’20. New York, NY, USA: Association for Computing Machinery, Feb. 2020, pp. 1–4. doi: 10.1145/3373017.3373028.

[25] R. Alshalan and H. Al-Khalifa, “A Deep Learning Approach for Automatic Hate Speech Detection in the Saudi Twittersphere,” Applied Sciences, vol. 10, no. 23, Art. no. 23, Jan. 2020, doi: 10.3390/app10238614.

[26] Z. Mu, S. Zheng, and Q. Wang, “ACL-RoBERTa-CNN Text Classification Model Combined with Contrastive Learning,” in 2021 International Conference on Big Data Engineering and Education (BDEE), Aug. 2021, pp. 193–197. doi: 10.1109/BDEE52938.2021.00041.

[27] R. Mengi, H. Ghorpade, and A. Kakade, “Fine-tuning T5 and RoBERTa Models for Enhanced Text Summarization and Sentiment Analysis”.

[28] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” Mar. 01, 2020, arXiv: arXiv:1910.01108. doi: 10.48550/arXiv.1910.01108.

[29] R. Silva Barbon and A. T. Akabane, “Towards Transfer Learning Techniques—BERT, DistilBERT, BERTimbau, and DistilBERTimbau for Automatic Text Classification from Different Languages: A Case Study,” Sensors, vol. 22, no. 21, Art. no. 21, Jan. 2022, doi: 10.3390/s22218184.

[30] S. Akpatsa et al., “Online News Sentiment Classification Using DistilBERT,” JQC, vol. 4, no. 1, pp. 1–11, 2022, doi: 10.32604/jqc.2022.026658.

[31] O. Karakaya and Z. H. Kilimci, “An efficient consolidation of word embedding and deep learning techniques for classifying anticancer peptides: FastText+BiLSTM,” PeerJ Comput. Sci., vol. 10, p. e1831, Feb. 2024, doi: 10.7717/peerj-cs.1831.

[32] J. Juarros-Basterretxea, G. Aonso-Diego, Á. Postigo, P. Montes-Álvarez, Á. Menéndez-Aller, and E. García-Cueto, “Post-hoc tests in one-way ANOVA: The case for normal distribution,” Methodology, vol. 20, no. 2, pp. 84–99, Jun. 2024, doi: 10.5964/meth.11721.

[33] D. C. Montgomery, Design and Analysis of Experiments, 8th ed. Arizona State University, 2022.