Comparative Analysis of KNN, Naïve Bayes and SVM Algorithms for Movie Genres Classification Based on Synopsis.
Abstract
Text classification is a process of categorizing a text into the correct label. Text classification in natural language processing is a challenging task that requires accuracy to get the correct results, manual text classification tends to be inefficient because it requires a lot of time and also experts. The utilization of machine learning for automatic text classification can be a solution to this problem. KNN, Naive Bayes, and SVM are known as some of the most algorithms to solve classification problems, especially text classification. In this study, we are trying to compare the KNN, Naive Bayes, and SVM algorithms for text classification with the problem of classifying movie genres based on a synopsis using datasets obtained from Kaggle.com and IMDB Dataset. The results of this study indicate that of the 12 experiments, Support Vector Machine (SVM) is the bestperforming algorithm with an accuracy of 90%, 93%, 65%, and 63%. It is hoped that this research can help to determine the best algorithm in the text classification process.
Keywords
Full Text:
PDFReferences
Su Jinshu, Zhang Bofeng, Xu Xin, (2006). Advances in Machine Learning-Based Text Categorization, Journal of Software, Vol.17, No.9, 2006, pp.18481859
Aggarwal, Charu C., and ChengXiang Zhai. “A Survey of Text Classification Algorithms.” Mining Text Data (2012). DOI:10.1007/978-1-4614-3223-4_6
Ikonomakis, M., Kotsiantis, S.B., & Tampakas, V.T. (2005). Text Classification Using Machine Learning Techniques. WSEAS TRANSACTIONS on COMPUTERS, Issue 8, Volume 4, August 2005, pp. 966-974
Kibriya, AM, Frank, E., Pfahringer, B., & Holmes, G. (2004). Multinomial naive Bayes for text categorization revisited. Lecture Notes in Computer Science, 488–499. https://doi.org/10.1007/978-3-540-30549-1_43
Harjule, P., Gurjar, A., Seth, H., & Thakur, P. (2020). Text classification on Twitter Data. 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE). https://doi.org/10.1109/icetce48199.2020.9091774
Prakasa, OS, & Lhaksmana, KM (2018). Text Classification Using the K-nearest Neighbor Algorithm in the Case of Government Performance on Twitter.
Nugroho, KS, Istiadi, I., & Marisa, F. (2020). Optimization of naive Bayes classifier for text classification in e-government using particle swarm optimization. Journal of Technology and Computer Systems, 8, 21-26. https://doi.org/10.14710/jtsiskom.8.1.2020.21-26
Khamar, K. (2013). Short Text Classification Using kNN Based on Distance Function. International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 4, April 2013
C.D. Manning, P. Raghavan, H. Schutze. Introduction to Information Retrieval. Cambridge UP, 2008
Yu, B. (2008). An evaluation of text classification methods for literary study. Literary and Linguistic Computing 23(3): 327-343.
Parapat, I.M., Furqon, M.T., Sutrisno(2018). Penerapan Metode Support Vector Machine (SVM) Pada Klasifikasi Penyimpangan Tumbuh Kembang Anak. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol. 2, No. 10, Oktober 2018, hlm. 3163-3169
Ab. Nasir, AF, Seok Nee, E., Sern Choong, C., Shahrizan Abdul Ghani, A., Abdul Majeed, AP, Adam, A., & Furqan, M. (2020). Text-based emotion prediction system using machine learning approach. IOP Conference Series: Materials Science and Engineering, 769(1), 012022. https://doi.org/10.1088/1757-899x/769/1/012022
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of AI Research, 16 2002, pp. 321-357.
S. Hassan, M. Rafi, and M. S. Shaikh (2011). Comparing SVM and Naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment. Proc. 14th IEEE Int. Multitopic Conf. 2011, INMIC 2011, pp. 31–34.
D. Sharma. (2016). Experimental Analysis of KNN with Naive Bayes, SVM, and Naive Bayes Algorithms for Spam Mail Detection. vol. 8491, no.4, pp. 225–228.
L. Pradhan, N. A. Taneja, C. Dixit, and M. Suhag. (2017). Comparison of Text Classifiers on News Articles. Int. Res. J. Eng. Technol., vol. 4, no. 3, pp. 251
DOI: https://doi.org/10.15408/jti.v15i2.29302 Abstract - 0 PDF - 0
Refbacks
- There are currently no refbacks.
Copyright (c) 2022 Nurhayati Buslim
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.
JTI Visitor Counter: View JTI Stats