Comparative Analysis of KNN, Naïve Bayes and SVM Algorithms for Movie Genres Classification Based on Synopsis.

Nurhayati Buslim, Lee Kyung Oh, Muhammad Hugo Athallah Hardy, Yusuf Wijaya

Abstract


Text classification is a process of categorizing a text into the correct label. Text classification in natural language processing is a challenging task that requires accuracy to get the correct results, manual text classification tends to be inefficient because it requires a lot of time and also experts. The utilization of machine learning for automatic text classification can be a solution to this problem. KNN, Naive Bayes, and SVM are known as some of the most algorithms to solve classification problems, especially text classification. In this study, we are trying to compare the KNN, Naive Bayes, and SVM algorithms for text classification with the problem of classifying movie genres based on a synopsis using datasets obtained from Kaggle.com and IMDB Dataset. The results of this study indicate that of the 12 experiments, Support Vector Machine (SVM) is the bestperforming algorithm with an accuracy of 90%, 93%, 65%, and 63%. It is hoped that this research can help to determine the best algorithm in the text classification process. 


Keywords


Movie Genres, Text Classification, Natural Language Processing, KNN, Naïve Bayes, SVM

Full Text:

PDF

References


Su Jinshu, Zhang Bofeng, Xu Xin, (2006). Advances in Machine Learning-Based Text Categorization, Journal of Software, Vol.17, No.9, 2006, pp.18481859

Aggarwal, Charu C., and ChengXiang Zhai. “A Survey of Text Classification Algorithms.” Mining Text Data (2012). DOI:10.1007/978-1-4614-3223-4_6

Ikonomakis, M., Kotsiantis, S.B., & Tampakas, V.T. (2005). Text Classification Using Machine Learning Techniques. WSEAS TRANSACTIONS on COMPUTERS, Issue 8, Volume 4, August 2005, pp. 966-974

Kibriya, AM, Frank, E., Pfahringer, B., & Holmes, G. (2004). Multinomial naive Bayes for text categorization revisited. Lecture Notes in Computer Science, 488–499. https://doi.org/10.1007/978-3-540-30549-1_43

Harjule, P., Gurjar, A., Seth, H., & Thakur, P. (2020). Text classification on Twitter Data. 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE). https://doi.org/10.1109/icetce48199.2020.9091774

Prakasa, OS, & Lhaksmana, KM (2018). Text Classification Using the K-nearest Neighbor Algorithm in the Case of Government Performance on Twitter.

Nugroho, KS, Istiadi, I., & Marisa, F. (2020). Optimization of naive Bayes classifier for text classification in e-government using particle swarm optimization. Journal of Technology and Computer Systems, 8, 21-26. https://doi.org/10.14710/jtsiskom.8.1.2020.21-26

Khamar, K. (2013). Short Text Classification Using kNN Based on Distance Function. International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 4, April 2013

C.D. Manning, P. Raghavan, H. Schutze. Introduction to Information Retrieval. Cambridge UP, 2008

Yu, B. (2008). An evaluation of text classification methods for literary study. Literary and Linguistic Computing 23(3): 327-343.

Parapat, I.M., Furqon, M.T., Sutrisno(2018). Penerapan Metode Support Vector Machine (SVM) Pada Klasifikasi Penyimpangan Tumbuh Kembang Anak. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol. 2, No. 10, Oktober 2018, hlm. 3163-3169

Ab. Nasir, AF, Seok Nee, E., Sern Choong, C., Shahrizan Abdul Ghani, A., Abdul Majeed, AP, Adam, A., & Furqan, M. (2020). Text-based emotion prediction system using machine learning approach. IOP Conference Series: Materials Science and Engineering, 769(1), 012022. https://doi.org/10.1088/1757-899x/769/1/012022

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of AI Research, 16 2002, pp. 321-357.

S. Hassan, M. Rafi, and M. S. Shaikh (2011). Comparing SVM and Naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment. Proc. 14th IEEE Int. Multitopic Conf. 2011, INMIC 2011, pp. 31–34.

D. Sharma. (2016). Experimental Analysis of KNN with Naive Bayes, SVM, and Naive Bayes Algorithms for Spam Mail Detection. vol. 8491, no.4, pp. 225–228.

L. Pradhan, N. A. Taneja, C. Dixit, and M. Suhag. (2017). Comparison of Text Classifiers on News Articles. Int. Res. J. Eng. Technol., vol. 4, no. 3, pp. 251




DOI: https://doi.org/10.15408/jti.v15i2.29302 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Nurhayati Buslim

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id


Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

JTI Visitor Counter: View JTI Stats

 Flag Counter