Comparative Analysis of KNN, Naïve Bayes and SVM Algorithms for Movie Genres Classification Based on Synopsis.

Nurhayati Buslim; Lee Kyung Oh; Muhammad Hugo Athallah Hardy; Yusuf Wijaya

doi:10.15408/jti.v15i2.29302

Authors

Nurhayati Buslim (SCOPUS ID : 55516191400, h-index: 4) Universitas Islam Negeri Syarif Hidayatullah https://orcid.org/0000-0002-6564-6641
Lee Kyung Oh Computer Engineering Department, Sun Moon University, Korea
Muhammad Hugo Athallah Hardy Syarif Hidayatullah State of Islamic University, Indonesia
Yusuf Wijaya

DOI:

https://doi.org/10.15408/jti.v15i2.29302

Keywords:

Movie Genres, Text Classification, Natural Language Processing, KNN, Naïve Bayes, SVM

Abstract

Text classification is a process of categorizing a text into the correct label. Text classification in natural language processing is a challenging task that requires accuracy to get the correct results, manual text classification tends to be inefficient because it requires a lot of time and also experts. The utilization of machine learning for automatic text classification can be a solution to this problem. KNN, Naive Bayes, and SVM are known as some of the most algorithms to solve classification problems, especially text classification. In this study, we are trying to compare the KNN, Naive Bayes, and SVM algorithms for text classification with the problem of classifying movie genres based on a synopsis using datasets obtained from Kaggle.com and IMDB Dataset. The results of this study indicate that of the 12 experiments, Support Vector Machine (SVM) is the bestperforming algorithm with an accuracy of 90%, 93%, 65%, and 63%. It is hoped that this research can help to determine the best algorithm in the text classification process.

Author Biography

Nurhayati Buslim, (SCOPUS ID : 55516191400, h-index: 4) Universitas Islam Negeri Syarif Hidayatullah

Google Scholar: https://scholar.google.co.id/citations?user=hWlz9Z0AAAAJ&hl=id

References

Su Jinshu, Zhang Bofeng, Xu Xin, (2006). Advances in Machine Learning-Based Text Categorization, Journal of Software, Vol.17, No.9, 2006, pp.18481859

Aggarwal, Charu C., and ChengXiang Zhai. “A Survey of Text Classification Algorithms.” Mining Text Data (2012). DOI:10.1007/978-1-4614-3223-4_6

Ikonomakis, M., Kotsiantis, S.B., & Tampakas, V.T. (2005). Text Classification Using Machine Learning Techniques. WSEAS TRANSACTIONS on COMPUTERS, Issue 8, Volume 4, August 2005, pp. 966-974

Kibriya, AM, Frank, E., Pfahringer, B., & Holmes, G. (2004). Multinomial naive Bayes for text categorization revisited. Lecture Notes in Computer Science, 488–499. https://doi.org/10.1007/978-3-540-30549-1_43

Harjule, P., Gurjar, A., Seth, H., & Thakur, P. (2020). Text classification on Twitter Data. 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE). https://doi.org/10.1109/icetce48199.2020.9091774

Prakasa, OS, & Lhaksmana, KM (2018). Text Classification Using the K-nearest Neighbor Algorithm in the Case of Government Performance on Twitter.

Nugroho, KS, Istiadi, I., & Marisa, F. (2020). Optimization of naive Bayes classifier for text classification in e-government using particle swarm optimization. Journal of Technology and Computer Systems, 8, 21-26. https://doi.org/10.14710/jtsiskom.8.1.2020.21-26

Khamar, K. (2013). Short Text Classification Using kNN Based on Distance Function. International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 4, April 2013

C.D. Manning, P. Raghavan, H. Schutze. Introduction to Information Retrieval. Cambridge UP, 2008

Yu, B. (2008). An evaluation of text classification methods for literary study. Literary and Linguistic Computing 23(3): 327-343.

Parapat, I.M., Furqon, M.T., Sutrisno(2018). Penerapan Metode Support Vector Machine (SVM) Pada Klasifikasi Penyimpangan Tumbuh Kembang Anak. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol. 2, No. 10, Oktober 2018, hlm. 3163-3169

Ab. Nasir, AF, Seok Nee, E., Sern Choong, C., Shahrizan Abdul Ghani, A., Abdul Majeed, AP, Adam, A., & Furqan, M. (2020). Text-based emotion prediction system using machine learning approach. IOP Conference Series: Materials Science and Engineering, 769(1), 012022. https://doi.org/10.1088/1757-899x/769/1/012022

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of AI Research, 16 2002, pp. 321-357.

S. Hassan, M. Rafi, and M. S. Shaikh (2011). Comparing SVM and Naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment. Proc. 14th IEEE Int. Multitopic Conf. 2011, INMIC 2011, pp. 31–34.

D. Sharma. (2016). Experimental Analysis of KNN with Naive Bayes, SVM, and Naive Bayes Algorithms for Spam Mail Detection. vol. 8491, no.4, pp. 225–228.

L. Pradhan, N. A. Taneja, C. Dixit, and M. Suhag. (2017). Comparison of Text Classifiers on News Articles. Int. Res. J. Eng. Technol., vol. 4, no. 3, pp. 251