A Comparative Analysis of Random Forest, XGBoost, and LightGBM Algorithms for Emotion Classification in Reddit Comments

Nenny Anggraini, Syopiansyah Jaya Putra, Luh Kesuma Wardhani, Farid Dhiya Ul Arif, Nashrul Hakiem, Imam Marzuki Shofi

Abstract


This research aims to compare the performance of three classification algorithms, namely Random Forest, XGBoost, and LightGBM, in classifying emotions in Reddit comments. Emotion classification in Reddit comments is a complex classification problem due to its numerous variations and ambiguities. This research utilizes the GoEmotions Fine-Grained dataset, filtered down to 7,325 Reddit comments with 5 different basic emotion labels. In this study, data preprocessing steps, feature extraction using CountVectorizer and TF-IDF, and hyperparameter tuning using GridSearchCV for each algorithm are conducted. Subsequently, model evaluation is performed using Cross-Validation and confusion matrix. The results of the study indicate that Random Forest outperforms the XGBoost and LightGBM algorithm with an accuracy of 75.38% compared to XGBoost with 69.05% accuracy and LightGBM with 66.63% accuracy.


Keywords


Emotion Classification, XGBoost, Random Forest, LightGBM

Full Text:

PDF

References


Basuki, A. T., & Yuliadi I. (2014). Electronic Data Processing (SPSS 15 dan EVIEWS 7) (1st ed.). Yogyakarta: Danisa Media.

Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020). FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Big Data, 8(3), 171–188. https://doi.org/10.1089/big.2020.0062

Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions: A Dataset of Fine-Grained Emotions. http://arxiv.org/abs/2005.00547

al Amrani, Y., Lazaar, M., & el Kadirp, K. E. (2018). Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Computer Science, 127, 511–520. https://doi.org/10.1016/j.procs.2018.01.150

Jihad, M. A. A., Adiwijaya, & Astuti W. (2021). Analisis Sentimen Terhadap Ulasan Film Menggunakan Algoritma Random Forest. E-Proceeding of Engineering, 8(5), 10153–10165.

Ahsana, R., Rohmat Saedudin, R., & Widartha, V. P. (2021). Perbandingan Akurasi Algoritma Adaboost Dan Algoritma Lightgbm Untuk Klasifikasi Penyakit Diabetes. E-Proceeding of Engineering, 9738–9748.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (n.d.). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM

Luo, S., & Chen, T. (2020). Two derivative algorithms of gradient boosting decision tree for silicon content in blast furnace system prediction. IEEE Access, 8, 196112–196122. https://doi.org/10.1109/ACCESS.2020.3034566

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (n.d.). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. https://github.com/Microsoft/LightGBM.

Supriya, B. N., & Akki, C. B. (2021). Sentiment Prediction Using Enhanced Xgboost And Tailored Random Forest. International Journal of Computing and Digital Systems, 10(1), 191–199.https://doi.org/10.12785/ijcds/100119

Jhaveri S, Khedkar I, Kantharia Y, & Jaswal S. (2019). Success Prediction using Random Forest, CatBoost, XGBoost and AdaBoost for Kickstarter Campaigns. Proceedings of the Third International Conference on Computing Methodologies and Communication (ICCMC 2019), 1170–1173. https://doi.org/10.1109/ICCMC.2019.8819828f

Muslim, I., & Karo, K. (2020). Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan. In Journal of Software Engineering, Information and Communication Technology (Vol. 1, Issue 1).

Maarif A. (2015). Penerapan Algoritma TF-IDF untuk Pencarian Karya Ilmiah. Universitas Dian Nuswantoro.

Ikegami, A., Dewa, I., Bayu, M., & Darmawan, A. (2022). Analisis Sentimen dan Pemodelan Topik Ulasan Aplikasi Noice Menggunakan XGBoost dan LDA. In JNATIA (Vol. 1, Issue 1).

Daoud, E. Al. (2019). Comparison between XGBoost, LightGBM and CatBoost Using a Home Credit Dataset. World Academy of Science, Engineering and Technology International Journal of Computer and Information Engineering, 13(1).

Syukron, M., Santoso, R., & Widiharih, T. (2020). Perbandingan Metode Smote Random Forest Dan Smote Xgboost Untuk Klasifikasi Tingkat Penyakit Hepatitis C Pada Imbalance Class Data. JURNAL GAUSSIAN, 9(3), 227–236. Retrieved from https://ejournal3.undip.ac.id/index.php/gaussian/

Zhang, D., & Gong, Y. (2020). The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3042848

S, G., Anand, A., Vijayvargiya, A., M, P., Moorthy, V., Kumar, S., & S, H. B. S. (2022). EmoSens: Emotion Recognition based on Sensor data analysis using LightGBM. 2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). https://doi.org/10.1109/CONECCT55679.2022.9865753

Syukron, M., Santoso, R., & Widiharih, T. (2020). Perbandingan Metode Smote Random Forest Dan Smote Xgboost Untuk Klasifikasi Tingkat Penyakit Hepatitis C Pada Imbalance Class Data. JURNAL GAUSSIAN, 9(3), 227– 236. Retrieved from https://ejournal3.undip.ac.id/index.php/gaussian/https://doi.org/10.1109/CONECCT55679.2022.9865753

Jhaveri S, Khedkar I, Kantharia Y, & Jaswal S. (2019). Success Prediction using Random Forest, CatBoost, XGBoost and AdaBoost for Kickstarter Campaigns. In Proceedings of the Third International Conference on Computing Methodologies and Communication (ICCMC 2019) (pp. 1170–1173). https://doi.org/10.1109/ICCMC.2019.8819828f

Gowriswari, S., & Brindha, S. (2022, March). Hyperparameters optimization using gridsearch cross validation method for machine learning models in predicting diabetes mellitus risk. In 2022 International conference on communication, computing and Internet of Things (IC3IoT) (pp. 1-4). IEEE.




DOI: https://doi.org/10.15408/jti.v17i1.38651 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Nenny Anggraini, Luh Kesuma Wardhani, Farid Dhiya Ul Arif, Syopiansyah Jaya Putra, Nashrul Hakiem

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id


Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

JTI Visitor Counter: View JTI Stats

 Flag Counter