Hoax News Detection Using Passive Aggressive Classifier And TfidfVectorizer

Maulana Fajar Lazuardi, Renaldy Hiunarto, Kareena Futri Ramadhani, Noviandi Noviandi, Riya Widayanti, Muhamad Hadi Arfian

Abstract


Indonesia is one of the countries with the highest number of social media users. Million social media users in Indonesia reached 167 million in January 2023. These users are spread, across various social media, including Twitter with 24 million users. The high number of social media users on Twitter makes the information validation process even more neglected. Moreover, the trend of news interest read by social media users is only adjusted to their individual tastes. This phenomenon is evidenced by the large number of fake news (hoaxes) circulating in society which are spread through social media. Therefore, an accurate machine learning model is needed to classify "real" and "hoax" news. This study uses the TfidfVectorizer algorithm and Passive Aggressive Classifier for datasets that are shared through the Kaggle site. The contents of the dataset were sourced via social media Twitter over a span of 5 years, namely 2015-2020. At the preprocessing stage to making the Confusion Matrix, the machine learning model shows that it can work well as expected, namely getting Accuracy, Precision, and Recall scores of 82.44%, 80.66%, and 82.44%. In addition, the results of the confusion matrix show that in the dataset used, there is more "real" news than "hoaxes", that is, the model is able to predict 1059 real news and 211 hoax news, with actual conditions 1106 real news and 164 hoax news.

Keywords


Machine Learning; Hoax News; Social Media; Classification; TfidfVectorizer; Passive Aggressive Classifier

Full Text:

PDF

References


Y. Rohmiyati, “Analisis Penyebaran Informasi Pada Sosial Media,” ANUVA, vol. 2, no. 1, pp. 29–42, 2018.

“Digital 2023: Indonesia — DataReportal – Global Digital Insights.” Accessed: May 19, 2023. [Online]. Available: https://datareportal.com/reports/digit al-2023-indonesia

M. Masril and F. W. Lubis, “Analisis Penggunaan Media Sosial dan Penyebaran Hoax Di Kota Medan,” JURNAL SIMBOLIKA: Research and Learning in Communication Study, vol. 6, no. 1, pp. 11–22, Apr. 2020, doi: 10.31289/simbollika.v6i1.2937.

Kementrian Komunikasi dan Informatika, “Triwulan Pertama 2023, Kominfo Identifikasi 425 Isu Hoaks,” SIARAN PERS NO. 50/HM/KOMINFO/04/2023. Accessed: May 20, 2023. [Online]. Available: https://www.kominfo.go.id/content/ detail/48363/siaran-pers-no50hmkominfo042023-tentangtriwulan-pertama-2023-kominfoidentifikasi-425-isuhoaks/0/siaran_pers

KBBI Daring, “Hasil Pencarian - KBBI Daring.” Accessed: Jul. 09, 2023. [Online]. Available: https://kbbi.kemdikbud.go.id/entri/hoaks

Y.-P. Chen et al., “The Prevalence and Impact of Fake News on COVID-19 Vaccination in Taiwan: Retrospective Study of Digital Media,” J Med Internet Res, vol. 24, 2022, [Online]. Available: https://api.semanticscholar.org/Corp usID:247954679

I. Afrianty, D. Nasien, and H. Haron, “Performance Analysis of Support Vector Machine in Sex Classification of The Sacrum Bone in Forensic Anthropology,” JURNAL TEKNIK INFORMATIKA, vol. 15, no. 1, pp. 63–72, Jun. 2022, doi: 10.15408/jti.v15i1.25254.

V. Alvian, D. Hidayatullah, A. Nilogiri, H. Azizah, and A. Faruq, “Klasifikasi Siswa Berprestasi Menggunakan Metode K-Nearest Neighbor (KNN) Pada SMA Negeri 2 Situbondo,” 2022. [Online]. Available: http://jurnal.unmuhjember.ac.id/inde x.php/JST

H. Susana and N. Suarna, “PENERAPAN MODEL KLASIFIKASI METODE NAIVE BAYES TERHADAP PENGGUNAAN AKSES INTERNET,” Jurnal Sistem Informasi dan Teknologi Informasi), vol. 4, no. 1, pp. 1–8, 2022.

R. Supriyadi, W. Gata, N. Maulidah, A. Fauzi, I. Komputer, and S. Nusa Mandiri Jalan Margonda Raya No, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” vol. 13, no. 2, pp. 67–75, 2020, [Online]. Available: http://journal.stekom.ac.id/index.ph p/E-Bisnis■page67

M. G. Hussain, M. R. Hasan, M. Rahman, J. Protim, and S. Al Hasan, “Detection of Bangla Fake News using MNB and SVM Classifier,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.14627

T. Arifin and D. Ariesta, “PREDIKSI PENYAKIT GINJAL KRONIS MENGGUNAKAN ALGORITMA NAIVE BAYES CLASSIFIER BERBASIS PARTICLE SWARM OPTIMIZATION,” Jurnal Tekno Insentif, vol. 13, no. 1, pp. 26–30, Apr. 2019, doi: 10.36787/jti.v13i1.97.

S. Suprianto, “Implementasi Algoritma Naive Bayes Untuk Menentukan Lokasi Strategis Dalam Membuka Usaha Menengah KeBawah di Kota Medan (Studi Kasus: Disperindag Kota Medan),” Jurnal Sistem Komputer dan Informatika (JSON), vol. 1, no. 2, p. 125, Jan. 2020, doi: 10.30865/json.v1i2.1939.

F. Hilmiyah, “PREDIKSI KINERJA MAHASISWA MENGGUNAKAN SUPPORT VECTOR MACHINE UNTUK PENGELOLA PROGRAM STUDI DI PERGURUAN TINGGI (Studi Kasus: Program Studi Magister Statistika ITS),” 2017.

A. Yonathan, H. Sujaini, E. Esyudha Pratama, and Nawawi, “Perbandingan Algoritma Klasifikasi dalam Pendeteksian Hoax pada Media Sosial,” 2022, doi: 10.26418/juara.v1i1.53126.

C. Dhaneswara, Y. Azhar, and N. Hayatin, “DETEKSI BERITA HOAX PADA DOKUMEN BERBAHASA INDONESIA MENGGUNAKAN METODE MODIFIED K- NEAREST NEIGHBOR,” 2021.

S. Soleman, “Pemanfaatan Metode Klasifikasi Naïve Bayes Untuk Pendeteksi Berita Hoax Pada Artikel Berbahasa Indonesia,” Jurnal CoreIT: Jurnal Hasil Penelitian Ilmu Komputer dan Teknologi Informasi, vol. 7, no. 2, p. 83, Dec. 2021, doi: 10.24014/coreit.v7i2.14290.

T. M. Khalil, M. Fmipa, U. Sahid, P. Matematika, and F. Uny, “Klasifikasi informasi hoaks pada media sosial twitter menggunakan algoritma random forest berbasis particle swarm optimization,” 2022. [Online]. Available: http://journal.student.uny.ac.id/ojs/i ndex.php/jktm:

M. Nurjannah, I. Fitri Astuti, and D. Program Studi, “PENERAPAN ALGORITMA TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TFIDF) UNTUK TEXT MINING Mahasiswa S1 Program Studi Ilmu Komputer FMIPA Universitas Mulawarman 2,3),” 2013.

K. Crammer, O. Dekel, and J. Keshet, “Online Passive-Aggressive Algorithms Shai Shalev-Shwartz Yoram Singer †,” 2006.

Kumar, P. Niranjan. "Detection of Textual Propaganda Using Passive Aggressive Classifiers." International Journal of Advanced Trends in Computer Science and Engineering, vol. 12, no. 2, MarchApril 2023, http://www.warse.org/IJATCSE/stat ic/pdf/file/ijatcse071222023.pdf, https://doi.org/10.30534/ijatcse/202 3/071222023.

Karyawati, AAIN Eka et. all. "Comparison of SVM and LIWC for Sentiment Analysis of SARA." International Journal on Cybernetics & Informatics (IJCCS), 2022, https://doi.org/10.22146/ijccs.69617

Ayu, Tsabitah et. all. "Pendiagnosa Daun Mangga Dengan Model Convolutional Neural Network." Jurnal Computational and Applied Mathematics (CESS), vol. 6, no. 2, 2021, https://doi.org/10.24114/cess.v6i2.2 2857.

S. Shalev-Shwartz and S. BenDavid, “Understanding Machine Learning: From Theory to Algorithms,” 2014. [Online]. Available: http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning

Y. Wang, Y. Jia, Y. Tian, and J. Xiao, “Deep reinforcement learning with the confusion-matrix-based dynamic reward function for customer credit scoring,” Expert Syst Appl, vol. 200, p. 117013, Aug. 2022, doi: 10.1016/J.ESWA.2022.117013




DOI: https://doi.org/10.15408/jti.v16i2.34084 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Maulana Fajar Lazuardi, Renaldy Hiunarto, Kareena Futri Ramadhani

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id


Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

JTI Visitor Counter: View JTI Stats

 Flag Counter