Struggling Models: An Analysis of Logistic Regression and Random Forest in Predicting Repeat Buyers with Imbalanced Performance Metrics

Siska Farizah Mauludiah, Yunifa Miftachul Arif, Muhammad Faisal, Dony Darmawan Putra

Abstract


Predicting repeat buyers is essential for businesses seeking to improve customer retention and maximize profitability. This study examines the effectiveness of logistic regression and random forest algorithms in forecasting repeat buyers, utilizing an e-commerce dataset from Kaggle. Despite the theoretical strengths of these models, our results indicate significant performance challenges. Both models were evaluated on key metrics: accuracy, precision, recall, F1 score, and ROC-AUC. The findings revealed that the models logistic regression and random forest performed poorly, with accuracy hovering around 50%, precision and recall demonstrating imbalanced performance, and ROC-AUC scores barely exceeding random guessing levels. Such metrics highlight the limited discriminative power of these models in identifying repeat buyers. The analysis suggests that issues such as data quality, feature relevance, and class imbalance contribute to these shortcomings. Specifically, the models struggled to effectively learn from the data, leading to suboptimal predictions. These results underscore the need for enhanced feature engineering, better handling of class imbalance, and possibly exploring more advanced algorithms. This study provides a critical assessment of the limitations inherent in using Logistic Regression and Random Forest for predicting repeat buyers, hence implements feature engineering, SMOTE and hyperparameter tuning using RandomSearchCV to get better result.


Keywords


E-commerce, repeat buyers, customer retention, logistic regression, random forest, imbalanced performance metrics

Full Text:

PDF

References


L. A. Jeni, J. F. Cohn, and F. de La Torre, “Facing imbalanced data - Recommendations for the use of performance metrics,” in Proceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, 2013, pp. 245–251. doi: 10.1109/ACII.2013.47.

H. Zhang and J. Dong, “Prediction of repeat customers on e-commerce platform based on blockchain,” Wireless Communications and Mobile Computing, vol. 2020, no. 1, pp. 1-15, 2020, doi: 10.1155/2020/8841437.

P. Song and Y. Liu, “An xgboost algorithm for predicting purchasing behaviour on e-commerce platforms,” Tehnicki Vjesnik, vol. 27, no. 5, pp. 1467–1471, Oct. 2020, doi: 10.17559/TV-20200808113807.

C. J. Liu, T. S. Huang, P. T. Ho, J. C. Huang, and C. T. Hsieh, “Machine learning-based e-commerce platform repurchase customer prediction model,” PLoS ONE, vol. 15, no. 12, Dec. 2020, Art. no. e0243105.

W. Zhang and M. Wang, “An improved deep forest model for prediction of e-commerce consumers’ repurchase behavior,” PLoS ONE, vol. 16, no. 9, Sep. 2021, Art. no. e0255906.

M. Zhang, J. Lu, N. Ma, T. C. E. Cheng, and G. Hua, “A feature engineering and ensemble learning based approach for repeated buyers prediction,” International Journal of Computers, Communications and Control, vol. 17, no. 6, 2022, Art. no. 4988.

J. Dong, T. Huang, M. Liang, and W. Wang, “Prediction of online consumers’ repeat purchase behavior via BERT-MLP model,” Journal of Electronic Research and Application, vol. 6, no. 3, pp. 12-19, 2022, doi: 10.26689/jera.v6i3.4010.

Y. Suhanda, L. Nurlaela, I. Kurniati, A. Dharmalau, and I. Rosita, “Predictive analysis of customer retention using the random forest algorithm,” TIERS Information Technology Journal, vol. 3, no. 1, pp. 35–47, Jun. 2022, doi: 10.38043/tiers.v3i1.3616.

S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, “Comparative analysis using various performance metrics in imbalanced data for multi-class text classification,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 14, no. 6, pp. 1082-1090, 2023, doi: 10.14569/IJACSA.2023.01406116.

M. Owusu-Adjei, J. ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digital Health, vol. 2, no. 11, Nov. 2023, Art. no. e0000290.

J. T. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “Evaluating classifier performance with highly imbalanced Big Data,” Journal of Big Data, vol. 10, no. 1, Dec. 2023, doi: 10.1186/s40537-023-00724-5.

E. Kuric, A. Puskas, P. Demcak, and D. Mensatorisova, “Effect of low-level interaction data in repeat purchase prediction task,” International Journal of Human-Computer Interaction, vol. 40, no. 10, pp. 2515–2533, 2024, doi: 10.1080/10447318.2023.2175973.

R. Kc, S. Shandilya, and M. Shandilya, “Unlocking Future Transactions: Predicting Customer’s Next Purchase in E-commerce through Machine Learning Analysis,” IJARIIE, vol. 9, no. 3, pp. 1077-1081.

T. S. De, P. Singh, and A. Patel, “A Machine learning and Empirical Bayesian Approach for Predictive Buying in B2B E-commerce,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Jan. 2024, pp. 17–24. doi: 10.1145/3647750.3647754.

H. Holzmann and B. Klar, “Robust performance metrics for imbalanced classification problems,” Apr. 2024, [Online]. Available: arXiv:2404.07661.

A. M. Halim, M. Dwifebri, and F. Nhita, “Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets,” Building of Informatics, Technology and Science (BITS), vol. 5. no. 1, pp. 246−253, Jun. 2023, doi: 10.47065/bits.v5i1.3647.




DOI: https://doi.org/10.15408/aism.v7i2.39326 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

EDITORIAL ADDRESS:

Department of Information Systems, Faculty of Science and Technology,
Universitas Islam Negeri (UIN) Syarif Hidayatullah Jakarta
Faculty of Science and Technology Building, 3rd Floor, 1st Campus, Universitas Islam Negeri (UIN) Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No. 95, Ciputat Timur, Kota Tangerang Selatan, Banten 15412, Indonesia.
Tlp/Fax: +622174019 25/+62217493315.
E-mail: aism.journal@apps.uinjkt.ac.id, Website: https://journal.uinjkt.ac.id/index.php/aism


Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Applied Information System and Management (AISM) | E-ISSN: 2621-254 | P-ISSN: 2621-2536 

https://journal.uinjkt.ac.id/index.php/aism