Evaluating The Effectiveness of Augmentation and Classifier Algorithms for Fraud Detection: Comparing CGAN and SMOTE with Random Forest and XGBoost

Sarmini Sarmini; Sunardi Sunardi; Abdul Fadlil

doi:10.15408/aism.v8i2.46308

Authors

Sarmini Sarmini Universitas Ahmad Dahlan
Sunardi Sunardi Universitas Ahmad Dahlan
Abdul Fadlil Universitas Ahmad Dahlan

DOI:

https://doi.org/10.15408/aism.v8i2.46308

Keywords:

Fraud detection, SMOTE, CGAN, data augmentation, imbalanced datasets, Random Forest, XGBoost

Abstract

Fraud detection in imbalanced datasets, where fraudulent transactions represent a small fraction of total data, presents a major challenge for machine learning models. Traditional classifiers often perform poorly in such scenarios due to their bias toward the majority class. This study investigates the effectiveness of two data augmentation techniques, Synthetic Minority Over-sampling Technique (SMOTE) and Conditional Generative Adversarial Networks (CGAN) in improving fraud detection performance. Both methods are applied to balance the dataset, and their impact is evaluated using two classifiers: Random Forest (RF) and XGBoost. The models are tested across three versions of the dataset: the original imbalanced data, the SMOTE-augmented data, and the CGAN-augmented data. Evaluation metrics include accuracy, precision, recall, F1 score, and ROC-AUC. Results indicate that both augmentation techniques enhance the models' ability to detect fraudulent transactions compared to the original dataset. Notably, CGAN outperforms SMOTE in terms of recall and F1 score, suggesting its ability to generate more diverse and realistic synthetic samples. While SMOTE creates new samples through interpolation, CGAN uses an adversarial process involving a generator and a discriminator, resulting in more complex data representations. The study also finds that XGBoost combined with CGAN yields the highest performance, effectively capturing intricate fraud patterns. In contrast, SMOTE, though beneficial, shows limited capacity in improving recall. This research highlights the importance of advanced augmentation techniques like CGAN in addressing class imbalance and improving fraud detection systems. It also opens pathways for future exploration of deep learning-based augmentation and classification methods in fraud detection.

References

B. Baesens, S. Höppner, I. Ortner, and T. Verdonck, “robROSE: A Robust Approach for Dealing With Imbalanced Data in Fraud Detection,” 2020, doi: 10.48550/arxiv.2003.11915.

S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.-S. Hacid, and H. Zeineddine, “An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection,” Ieee Access, vol. 7, pp. 93010–93022, 2019, doi: 10.1109/access.2019.2927266.

T. R. Noviandy, “Credit Card Fraud Detection for Contemporary Financial Management Using XGBoost-Driven Machine Learning and Data Augmentation Techniques,” Indatu J Manag Acc., vol. 1, no. 1, pp. 29–35, 2023, doi: 10.60084/ijma.v1i1.78.

N. M. Mqadi, N. Naicker, and T. T. Adeliyi, “A SMOTe Based Oversampling Data-Point Approach to Solving the Credit Card Data Imbalance Problem in Financial Fraud Detection,” Int. J. Comput. Digit. Syst., vol. 10, no. 1, pp. 277–286, 2021, doi: 10.12785/ijcds/100128.

J. M. Rahman and H. Zhu, “Predicting Accounting Fraud Using Imbalanced Ensemble Learning Classifiers – Evidence From China,” Account. Finance, vol. 63, no. 3, pp. 3455–3486, 2023, doi: 10.1111/acfi.13044.

M. Zhu, “Enhancing Credit Card Fraud Detection: A Neural Network and SMOTE Integrated Approach,” Jtpes, vol. 4, no. 02, pp. 23–30, 2024, doi: 10.53469/jtpes.2024.04(02).04.

E. Wu, H. Cui, and R. E. Welsch, “Dual Autoencoders Generative Adversarial Network for Imbalanced Classification Problem,” Ieee Access, vol. 8, pp. 91265–91275, 2020, doi: 10.1109/access.2020.2994327.

Z. Salekshahrezaee, J. L. Leevy, and T. M. Khoshgoftaar, “The Effect of Feature Extraction and Data Sampling on Credit Card Fraud Detection,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00684-w.

I. d. Zarzà, J. d. Curtò, and C. T. Calafate, “Optimizing Neural Networks for Imbalanced Data,” Electronics, vol. 12, no. 12, p. 2674, 2023, doi: 10.3390/electronics12122674.

R. K. L. Kennedy, Z. Salekshahrezaee, F. Villanustre, and T. M. Khoshgoftaar, “Iterative Cleaning and Learning of Big Highly-Imbalanced Fraud Data Using Unsupervised Learning,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00750-3.

H. Peng and J. Wang, “Unbalanced Data Processing and Machine Learning in Credit Card Fraud Detection,” 2022, doi: 10.21203/rs.3.rs-2004320/v1.

Y. B. Chu, “Credit Card Fraud Detection on Original European Credit Card Holder Dataset Using Ensemble Machine Learning Technique,” J. Cyber Secur., vol. 5, no. 0, pp. 33–46, 2023, doi: 10.32604/jcs.2023.045422.

D. P. Kadam, “Machine Learning Approaches to Credit Card Fraud Detection,” Int. J. Res. Appl. Sci. Eng. Technol., vol. 12, no. 4, pp. 2802–2807, 2024, doi: 10.22214/ijraset.2024.60531.

Et al. Rose Mary Mathew, “A Hybrid Resampling Approach for Multiclass Skewed Datasets and Experimental Analysis With Diverse Classifier Models,” Int. J. Recent Innov. Trends Comput. Commun., vol. 11, no. 10, pp. 1108–1114, 2023, doi: 10.17762/ijritcc.v11i10.8631.

E. Pan, “Machine Learning in Financial Transaction Fraud Detection and Prevention,” Tebmr, vol. 5, pp. 243–249, 2024, doi: 10.62051/16r3aa10.

S. F. Farabi, “Enhancing Credit Card Fraud Detection: A Comprehensive Study of Machine Learning Algorithms and Performance Evaluation,” J. Bus. Manag. Stud., vol. 6, no. 3, pp. 252–259, 2024, doi: 10.32996/jbms.2024.6.13.21.

G. Airlangga, “Evaluating the Efficacy of Machine Learning Models in Credit Card Fraud Detection,” J. Comput. Netw. Archit. High Perform. Comput., vol. 6, no. 2, pp. 829–837, 2024, doi: 10.47709/cnahpc.v6i2.3814.

D. Lin, “An Empirical Analysis of Machine Learning for Fraud Detection in Diverse Financial Scenarios,” Adv. Econ. Manag. Polit. Sci., vol. 42, no. 1, pp. 202–216, 2023, doi: 10.54254/2754-1169/42/20232110.

Q. Zeng, L. Lin, R. Jiang, W. Huang, and D. Lin, “NNEnsLeG: A novel approach for e-commerce payment fraud detection using ensemble learning and neural networks,” Inf. Process. Manag., vol. 62, no. 1, p. 103916, Jan. 2025, doi: 10.1016/j.ipm.2024.103916.

P. Verma and P. Tyagi, “Analysis of Supervised Machine Learning Algorithms in the Context of Fraud Detection,” Ecs Trans., vol. 107, no. 1, pp. 7189–7200, 2022, doi: 10.1149/10701.7189ecst.

J. Zhang, L. Chen, and F. Abid, “Prediction of Breast Cancer From Imbalance Respect Using Cluster-Based Undersampling Method,” J. Healthc. Eng., vol. 2019, pp. 1–10, 2019, doi: 10.1155/2019/7294582.

H. He and E. A. Garcia, “Learning From Imbalanced Data,” Ieee Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009, doi: 10.1109/tkde.2008.239.

S. F. Pratama and A. M. Wahid, “Fraudulent Transaction Detection in Online Systems Using Random Forest and Gradient Boosting,” Journal of Cyber Law, vol. 1, no. 1, Art. no. 1, Mar. 2025.

J. Kim, Y. Han, and J.-S. Lee, “Data Imbalance Problem Solving for SMOTE Based Oversampling: Study on Fault Detection Prediction Model in Semiconductor Manufacturing Process,” 2016, doi: 10.14257/astl.2016.133.15.

Evaluating The Effectiveness of Augmentation and Classifier Algorithms for Fraud Detection: Comparing CGAN and SMOTE with Random Forest and XGBoost

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Side Bar

CITATION ANALYSIS

CITATION ANALYSIS

ARTICLE TEMPLATE

ARTICLE TEMPLATE

Developed By