Enhancing Repeat Buyer Classification with Multi Feature Engineering in Logistic Regression
Abstract
This study presents a novel approach to improving repeat buyer classification on e-commerce platforms by integrating Kullback-Leibler (KL) divergence with logistic regression and focused feature engineering techniques. Repeat buyers are a critical segment for driving long-term revenue and customer retention, yet identifying them accurately poses challenges due to class imbalance and the complexity of consumer behavior. This research uses KL divergence in a new way to help choose important features and evaluate the model, making it easier to understand and more effective at classifying repeat buyers, unlike traditional methods. Using a real-world dataset from Indonesian e-commerce with 1,000 records, divided into 80% for training and 20% for testing, the study uses logistic regression along with techniques like SMOTE for oversampling, class weighting, and regularization to fix issues with data imbalance and overfitting. Model performance is assessed using accuracy, precision, recall, F1-score, and KL divergence. Experimental results indicate that the KL-enhanced logistic regression model significantly outperforms the baseline, especially in balancing precision and recall for the minority class of repeat buyers. The unique contribution of this work lies in its synergistic use of KL divergence in both the feature engineering and evaluation phases, offering a robust, interpreted, and data-efficient solution. For e-commerce businesses, the findings translate into improved targeting of high-value customers, better personalization of marketing efforts, and more strategic allocation of resources. This research offers practical tips for enhancing predictive customer analytics and supports data-driven decision-making in digital commerce environments.
Keywords
Full Text:
PDFReferences
K. Kareena and R. Kumar, “A consumer behavior prediction method for e-commerce application,” International Journal of Recent Technology and Engineering, vol. 8, no. 2 Special Issue 6, pp. 983–988, Jul. 2019, doi: 10.35940/ijrte.B1171.0782S619.
T. Charanasomboon and W. Viyanon, “A comparative study of repeat buyer prediction: Kaggle acquired value shopper case study,” in ACM International Conference Proceeding Series, Association for Computing Machinery, 2019, pp. 306–310. doi: 10.1145/3322645.3322681.
C. J. Liu, T. S. Huang, P. T. Ho, J. C. Huang, and C. T. Hsieh, “Machine learning-based e-commerce platform repurchase customer prediction model,” PLoS ONE, vol. 15, no. 12, pp.1–15, Dec. 2020, doi: 10.1371/journal.pone.0243105.
A. Ahmed, A. Jalal, and K. Kim, “A novel statistical method for scene classification based on multi-object categorization and logistic regression,” Sensors (Switzerland), vol. 20, no. 14, pp. 1–20, Jul. 2020, doi: 10.3390/s20143871.
P. Song and Y. Liu, “An xgboost algorithm for predicting purchasing behaviour on e-commerce platforms,” Tehnicki Vjesnik, vol. 27, no. 5, pp. 1467–1471, Oct. 2020, doi: 10.17559/TV-20200808113807.
H. Zhang and J. Dong, “Prediction of repeat customers on e-commerce platform based on blockchain,” Wireless Communications and Mobile Computing, vol. 2020, pp.1–15, Aug. 2020, doi: 10.1155/2020/8841437.
B. Noori, “Classification of customer reviews using machine learning algorithms,” Applied Artificial Intelligence, vol. 35, no. 8, pp. 567–588, 2021, doi: 10.1080/08839514.2021.1922843.
S. Pourmand, A. Shabbak, and M. Ganjali, “Feature selection based on divergence functions: A comparative classification study,” Statistics, Optimization and Information Computing, vol. 9, no. 3, pp. 587–606, 2021, doi: 10.19139/soic-2310-5070-1092.
T. Wang, P. Chen, T. Bao, J. Li, and X. Yu, “Arrhythmia classification algorithm based on SMOTE and feature selection,” International Journal of Performability Engineering, vol. 17, no. 3, pp. 263–275, Mar. 2021, doi: 10.23940/ijpe.21.03.p2.263275.
Y. Suhanda, L. Nurlaela, I. Kurniati, A. Dharmalau, and I. Rosita, “Predictive analysis of customer retention using the random forest algorithm,” TIERS Information Technology Journal, vol. 3, no. 1, pp. 35–47, Jun. 2022, doi: 10.38043/tiers.v3i1.3616.
T. K. Nguyen, Z. Ahmad, and J. M. Kim, “A deep-learning-based health indicator constructor using kullback–leibler divergence for predicting the remaining useful life of concrete structures,” Sensors, vol. 22, no. 10, Art. no. 3687, May 2022, doi: 10.3390/s22103687.
M. Zhang, J. Lu, N. Ma, T. C. E. Cheng, and G. Hua, “A feature engineering and ensemble learning based approach for repeated buyers prediction,” International Journal of Computers, Communications and Control, vol. 17, no. 6, pp.1–17, Dec. 2022, doi: 10.15837/ijccc.2022.6.4988.
M. B. Tamam, H. Hozairi, M.Walid, and J. F. A. Bernardo, “Classification of sign language in real time using convolutional neural network,” Applied Information System and Management (AISM), vol. 6, no. 1, pp. 39–46, Apr. 2023, doi: 10.15408/aism.v6i1.29820.
F. Mendonça, S. S. Mostafa, F. Morgado-Dias, and A. G. Ravelo-García, “On the use of kullback–leibler divergence for kernel selection and interpretation in variational autoencoders for feature creation,” Information (Switzerland), vol. 14, no. 10, pp.1–15, Oct. 2023, doi: 10.3390/info14100571.
U. Buatoom and M. U. Jamil, “Improving classification performance with statistically weighted dimensions and dimensionality reduction,” Applied Sciences (Switzerland), vol. 13, no. 3, pp.1–20, Feb. 2023, doi: 10.3390/app13032005.
R. Kc, S. Shandilya, and M. Shandilya, “Unlocking Future Transactions: Predicting Customer’s Next Purchase in E-commerce through Machine Learning Analysis,” International Journal of Advance Research, Ideas and Innovations in Technology (IJARIIE), vol. 9, no. 3, pp. 1077–1081, Apr. 2023.
L. Li, “Analysis of e-commerce customers’ shopping behavior based on data mining and machine learning,” Soft Computing, vol. 27, no. 29, pp. 1–14, Jul. 2023, doi: 10.1007/s00500-023-08903-5.
M. Owusu-Adjei, J. Ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digital Health, vol. 2, no. 11, pp. 1–19, Nov. 2023, doi: 10.1371/journal.pdig.0000290.
J. T. Hancock, T. M. Khoshgoftaar, and J. M. Johnson, “Evaluating classifier performance with highly imbalanced Big Data,” Journal of Big Data, vol. 10, no. 1, pp.1–31, Dec. 2023, doi: 10.1186/s40537-023-00724-5.
S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah. “Comparative analysis using various performance metrics in imbalanced data for multi-class text classification,” International Journal of Data Science, vol. 5 no.2, pp. 45–53. July 2023. doi:10.14569/IJACSA.2023.01406116.
D. Dablain, B. Krawczyk, and N. v. Chawla, “DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 9, pp. 6390–6404, Sep. 2023, doi: 10.1109/TNNLS.2021.3136503.
M. Mujahid, E. Kına, F. Rustam, M. G. Villar, E. S. Alvarado, I. D. L. T. Díez, and I. Ashraf, “Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering,” Journal of Big Data, vol. 11, no. 1, pp.1–32, Dec. 2024, doi: 10.1186/s40537-024-00943-4.
H. Gong, Y. Li, J. Zhang, B. Zhang, and X. Wang, “A new filter feature selection algorithm for classification task by ensembling pearson correlation coefficient and mutual information,” Engineering Applications of Artificial Intelligence, vol. 131, Art. no. 107865, May. 2024, doi: 10.1016/j.engappai.2024.107865.
D. C. Gkikas and P. K. Theodoridis, “Predicting online shopping behavior: Using machine learning and google analytics to classify user engagement,” Applied Sciences (Switzerland), vol. 14, no. 23, pp.1-31, Dec. 2024, doi: 10.3390/app142311403.
A. Luque, A. Carrasco, A. Martín, and A. D. L. Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognition, vol. 91, pp. 216–231, May. 2019, doi: 10.1016/j.patcog.2019.02.023.
M. S. Rao et al., “Kullback–leibler divergence-based feature selection method for image texture classification,” in Lecture Notes in Networks and Systems, Springer Science and Business Media Deutschland GmbH, 2024, pp. 309–318. doi: 10.1007/978-981-99-9704-6_27.
S. F. Mauludiah, Y. M. Arif, M. Faisal, and D. D. Putra, “Struggling models: an analysis of logistic regression and random forest in predicting repeat buyers with imbalanced performance metrics,” Applied Information System and Management (AISM), vol. 7, no. 2, pp. 31–38, Jul. 2024, doi: 10.15408/aism.v7i2.39326.
N. Yue, “Identify potential loyalists in shopping festival: repeat buyer prediction for e-commerce based on feature engineering and ensemble learning,” M.S. thesis, Erasmus School of Economics, Erasmus Univ. Rotterdam, Rotterdam, Netherlands, Sept. 2024. [Online]. Available: https://thesis.eur.nl/pub/72505
P. A. Sunarya, U. Rahardja, S. C. Chen, Y. M. Li, and M. Hardini, “Deciphering digital social dynamics: A comparative study of logistic regression and random forest in predicting e-commerce customer behavior,” Journal of Applied Data Sciences, vol. 5, no. 1, pp. 100–113, Jan. 2024, doi: 10.47738/jads.v5i1.155.
DOI: https://doi.org/10.15408/aism.v8i1.45025
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
EDITORIAL ADDRESS:
Department of Information Systems, Faculty of Science and Technology,
Universitas Islam Negeri (UIN) Syarif Hidayatullah Jakarta
Faculty of Science and Technology Building, 3rd Floor, 1st Campus, Universitas Islam Negeri (UIN) Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No. 95, Ciputat Timur, Kota Tangerang Selatan, Banten 15412, Indonesia.
Tlp/Fax: +622174019 25/+62217493315.
E-mail: aism.journal@apps.uinjkt.ac.id, Website: https://journal.uinjkt.ac.id/index.php/aism
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Applied Information System and Management (AISM) | E-ISSN: 2621-254 | P-ISSN: 2621-2536
https://journal.uinjkt.ac.id/index.php/aism
slot88
situs toto