Voice Spoofing Classification Using Residual Bidirectional Long Short Term Memory

Fatan Kasyidi; Rifaz Muhammad Sukma; Annisa Mufidah Sopian; Dhika Rizki Anbiya

doi:10.15408/jti.v18i2.43281

Authors

Fatan Kasyidi Departement of Informatics, Faculty of Science and Informatics, Jenderal Achmad Yani University, Indonesia https://orcid.org/0009-0007-0386-5535
Rifaz Muhammad Sukma Departement of Informatics, Faculty of Science and Informatics, Jenderal Achmad Yani University, Indonesia https://orcid.org/0009-0004-6797-6873
Annisa Mufidah Sopian Departement of Informatics, Faculty of Science and Informatics, Jenderal Achmad Yani University, Indonesia https://orcid.org/0009-0005-8553-0266
Dhika Rizki Anbiya Pusat Riset Kecerdasan Artifisial dan Keamanan Siber, Organisasi Riset Elektronika dan Informatika, BRIN, Indonesia https://orcid.org/0009-0002-5825-505X

DOI:

https://doi.org/10.15408/jti.v18i2.43281

Keywords:

Voice Spoofing Attacks, Residual Bidirectional Long Short Term Memory (R-BLSTM), ASVSpoof 2019, anti-voice spoofing

Abstract

Voice spoofing attacks are a major security concern for speech-based biometric systems. Detection and classification of spoofed voice are essential steps for preventing unauthorized accesses. This study proposes a novel approach to voice spoofing classification using a Residual Bidirectional Long Short Term Memory (R-BLSTM) network. The goal is to enhance the accuracy and robustness of voice spoofing detection using the power of deep learning and residual connections. The current proposed approach based on bidirectional LSTM with residual connections is designed to capture long-range dependencies and latent characteristics of speech signals. Experimental evidence that the R-BLSTM model is superior to classic ML techniques is also demonstrated by observing an accuracy of 95.6% on the ASVspoof 2019 collection. The designed system can be further utilized for enriching the security of speech-based biometrics modalities and making anti-voice spoofing attacks ineffective.

References

[1] M. R. Kamble, H. B. Sailor, H. A. Patil, and H. Li, “Advances in Anti-Spoofing: From the Perspective of ASVspoof Challenges,” APSIPA Trans Signal Inf Process, 2020, doi: 10.1017/atsip.2019.21.

[2] A. Nautsch et al., “ASVspoof 2019: Spoofing Countermeasures for the Detection of Synthesized, Converted and Replayed Speech,” IEEE Trans Biom Behav Identity Sci, 2021, doi: 10.1109/tbiom.2021.3059479.

[3] A. Kuznetsov, R. A. Murtazin, I. M. Garipov, E. A. Fedorov, A. V Kholodenina, and A. Vorobeva, “Methods of Countering Speech Synthesis Attacks on Voice Biometric Systems in Banking,” Scientific and Technical Journal of Information Technologies Mechanics and Optics, 2021, doi: 10.17586/2226-1494-2021-21-1-109-117.

[4] C.-W. Bang, “Effective Zero-Shot Multi-Speaker Text-to-Speech Technique Using Information Perturbation and a Speaker Encoder,” Sensors, 2023, doi: 10.3390/s23239591.

[5] J. Guo, Z. Yun-yu, and H. Wang, “Generalized Spoof Detection and Incremental Algorithm Recognition for Voice Spoofing,” Applied Sciences, 2023, doi: 10.3390/app13137773.

[6] G. Qadir, S. Zareen, F. Hassan, and A. U. Rahman, “Voice Spoofing Countermeasure Based on Spectral Features to Detect Synthetic Attacks Through LSTM,” International Journal of Innovations in Science and Technology, vol. 3, no. 5, pp. 153–165, Jan. 2022, doi: 10.33411/ijist/2021030512.

[7] T. Kaichi and Y. Ozasa, “A Hyperspectral Approach for Unsupervised Spoof Detection With Intra-Sample Distribution,” 2021, doi: 10.1109/icip42928.2021.9506625.

[8] H. Tak, “Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation,” 2022, doi: 10.48550/arxiv.2202.12233.

[9] Y. Zhang, “One-Class Learning Towards Synthetic Voice Spoofing Detection,” 2020, doi: 10.48550/arxiv.2010.13995.

[10] I. Gurowiec, “Speech Emotion Recognition Systems and Their Security Aspects,” Artif Intell Rev, 2024, doi: 10.1007/s10462-024-10760-z.

[11] E. Wenger et al., “‘Hello, It’s Me’: Deep Learning-Based Speech Synthesis Attacks in the Real World,” 2021, doi: 10.1145/3460120.3484742.

[12] C. Hu and R. Zhou, “Synthetic Voice Spoofing Detection Based on Online Hard Example Mining,” 2022, doi: 10.22541/au.166429442.20902648/v1.

[13] J. Guo and Z. Zhao, “Generalized Spoof Detection and Incremental Algorithm Recognition for Spoofing Voice,” 2022, doi: 10.21203/rs.3.rs-2149586/v1.

[14] H. Zeinali et al., “Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge,” 2019, doi: 10.21437/interspeech.2019-2892.

[15] A. Chadha, A. Abdullah, and L. Angeline, “A Unique Glottal Flow Parameters Based Features for Anti-Spoofing Countermeasures in Automatic Speaker Verification,” International Journal of Advanced Computer Science and Applications, 2021, doi: 10.14569/ijacsa.2021.0120894.

[16] H. Mewada et al., “Gaussian-Filtered High-Frequency-Feature Trained Optimized BiLSTM Network for Spoofed-Speech Classification,” Sensors, vol. 23, no. 14, Jul. 2023, doi: 10.3390/s23146637.

[17] A. Chaudhari and D. K. Shedge, “Integration of CQCC and MFCC based Features for Replay Attack Detection,” in 2022 International Conference on Emerging Smart Computing and Informatics (ESCI), 2022, pp. 1–5. doi: 10.1109/ESCI53509.2022.9758391.

[18] R. Anagha, A. Arya, V. H. Narayan, S. Abhishek, and T. Anjali, “Audio Deepfake Detection Using Deep Learning,” in Proceedings of the 2023 12th International Conference on System Modeling and Advancement in Research Trends, SMART 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 176–181. doi: 10.1109/SMART59791.2023.10428163.

[19] M. Todisco et al., “ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection,” 2019, doi: 10.21437/interspeech.2019-2249.

[20] C. B. Tan et al., “A Survey on Presentation Attack Detection for Automatic Speaker Verification Systems: State-of-the-Art, Taxonomy, Issues and Future Direction,” Multimed Tools Appl, 2021, doi: 10.1007/s11042-021-11235-x.

[21] U. Garg, S. Agarwal, S. Gupta, R. Dutt, and D. Singh, “Prediction of Emotions from the Audio Speech Signals using MFCC, MEL and Chroma,” Nov. 2020, pp. 87–91. doi: 10.1109/CICN49253.2020.9242635.

[22] U. Ayvaz, H. Gürüler, F. U. Khan, N. Ahmed, T. K. Whangbo, and A. A. Bobomirzaevich, “Automatic Speaker Recognition Using Mel-Frequency Cepstral Coefficients Through Machine Learning,” Computers Materials & Continua, 2022, doi: 10.32604/cmc.2022.023278.

[23] M. Neelima and I. S. Prabha, “Spoofing Detection and Countermeasure in Automatic Speaker Verification System Using Dynamic Features,” International Journal of Recent Technology and Engineering, 2020, doi: 10.35940/ijrte.e6582.018520.

[24] A. Moondra and P. Chahal, “Improved Speaker Recognition for Degraded Human Voice Using Modified-MFCC and LPC With CNN,” International Journal of Advanced Computer Science and Applications, 2023, doi: 10.14569/ijacsa.2023.0140416.

[25] A. Moondra and P. Chahal, “Speaker Recognition Improvement for Degraded Human Voice Using Modified-MFCC With GMM,” International Journal of Advanced Computer Science and Applications, 2023, doi: 10.14569/ijacsa.2023.0140627.

[26] B. Bhagat and M. Dua, “Enhancing Performance of End-to-End Gujarati Language ASR Using Combination of Integrated Feature Extraction and Improved Spell Corrector Algorithm,” Itm Web of Conferences, 2023, doi: 10.1051/itmconf/20235401016.

[27] K. Phapatanaburi, L. Wang, S. Nakagawa, and M. Iwahashi, “Replay Attack Detection Using Linear Prediction Analysis-Based Relative Phase Features,” Ieee Access, 2019, doi: 10.1109/access.2019.2960369.

[28] D. Li et al., “Multiple phase information combination for replay attacks detection,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association, 2018, pp. 656–660. doi: 10.21437/Interspeech.2018-2001.

[29] R. Jahangir et al., “Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network,” Ieee Access, 2020, doi: 10.1109/access.2020.2973541.

[30] S. Ibrar, A. Javed, and H. Ilyas, “Voice Presentation Attacks Detection using Acoustic MLTP Features and BiLSTM,” in 2023 3rd International Conference on Communication, Computing and Digital Systems, C-CODE 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/C-CODE58145.2023.10139903.

[31] C. Wall, L. Zhang, Y. Yu, and K. Mistry, “Deep Recurrent Neural Networks With Attention Mechanisms for Respiratory Anomaly Classification,” 2021, doi: 10.1109/ijcnn52387.2021.9533966.

[32] W. Huang, X. Liu, M. Luo, P. Zhang, W. Wang, and J. Wang, “Video-Based Abnormal Driving Behavior Detection via Deep Learning Fusions,” Ieee Access, 2019, doi: 10.1109/access.2019.2917213.

[33] Y. R. Musunuri and O. Kwon, “Deep Residual Dense Network for Single Image Super-Resolution,” Electronics (Basel), 2021, doi: 10.3390/electronics10050555.

[34] Y. Han, P. Cui, Y. Zhang, R.-G. Zhou, S. Yang, and J. Wang, “Remote Sensing Sea Ice Image Classification Based on Multilevel Feature Fusion and Residual Network,” Math Probl Eng, 2021, doi: 10.1155/2021/9928351.

[35] D. Chen, F. Hu, G. Nian, and T. Yang, “Deep Residual Learning for Nonlinear Regression,” Entropy, 2020, doi: 10.3390/e22020193.

[36] B.-C. Yang and G. Wu, “Efficient Single Image Super-Resolution Using Dual Path Connections With Multiple Scale Learning,” 2021, doi: 10.48550/arxiv.2112.15386.

[37] B. Liu, K. Gao, A. Yu, W. Guo, R. Wang, and X. Zuo, “Semisupervised Graph Convolutional Network for Hyperspectral Image Classification,” J Appl Remote Sens, 2020, doi: 10.1117/1.jrs.14.026516.

[38] J. Wu, W. Hu, Y. Wen, W. Tu, and X. Liu, “Skin Lesion Classification Using Densely Connected Convolutional Networks With Attention Residual Learning,” Sensors, 2020, doi: 10.3390/s20247080.

[39] Y. Zhang, “Deep Learning Distributed Architecture Design Implementation for Computer Vision,” Wirel Commun Mob Comput, 2022, doi: 10.1155/2022/9726286.

[40] I. A. Klampanos, A. Davvetas, A. Koukourikos, and V. Karkaletsis, “ANNETT-O: An Ontology for Describing Artificial Neural Network Evaluation, Topology and Training,” Int J Metadata Semant Ontol, 2019, doi: 10.1504/ijmso.2019.099833.

[41] Y.-K. Lee, W. Sim, P. Jeongmook, and J. Lee, “Evaluation of Hyperparameter Combinations of the U-Net Model for Land Cover Classification,” Forests, 2022, doi: 10.3390/f13111813.

[42] S. Riyanto, I. S. Sitanggang, T. Djatna, and T. D. Atikah, “Comparative Analysis Using Various Performance Metrics in Imbalanced Data for Multi-Class Text Classification,” International Journal of Advanced Computer Science and Applications, 2023, doi: 10.14569/ijacsa.2023.01406116.

[43] A. A. Alnuaim et al., “Speaker Gender Recognition Based on Deep Neural Networks and ResNet50,” Wirel Commun Mob Comput, 2022, doi: 10.1155/2022/4444388.

[44] S. Sen, S. Maiti, S. Manna, B. Roy, and A. Gosh, “Smart Prediction of Water Quality System for Aquaculture Using Machine Learning Algorithms,” 2023. doi: 10.36227/techrxiv.22300435.v1.

[45] J. Boyd, M. Fahim, and O. Olukoya, “Voice spoofing detection for multiclass attack classification using deep learning,” Machine Learning with Applications, vol. 14, p. 100503, Dec. 2023, doi: 10.1016/j.mlwa.2023.100503.

[46] A. Mittal and M. Dua, “Static–dynamic Features and Hybrid Deep Learning Models Based Spoof Detection System for ASV,” Complex & Intelligent Systems, 2021, doi: 10.1007/s40747-021-00565-w.

[47] M. Adiban, H. Sameti, and S. Shehnepoor, “Replay Spoofing Countermeasure Using Autoencoder and Siamese Network on ASVspoof 2019 Challenge,” 2019, doi: 10.48550/arxiv.1910.13345.

[48] C. Hu, “Synthetic Speech Spoofing Detection Based on Online Hard Example Mining,” Ieee Access, 2023, doi: 10.1109/access.2023.3311849.

[49] X. Cheng, M. Xu, and T. F. Zheng, “A Multi-Branch ResNet With Discriminative Features for Detection of Replay Speech Signals,” APSIPA Trans Signal Inf Process, 2020, doi: 10.1017/atsip.2020.26.

[50] H. Yu, Z. Tan, Z. Ma, R. Martin, and J. Guo, “Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features,” IEEE Trans Neural Netw Learn Syst, 2018, doi: 10.1109/tnnls.2017.2771947.

[51] Q. Wang, K. A. Lee, and T. Koshinaka, “Using Multi-Resolution Feature Maps With Convolutional Neural Networks for Anti-Spoofing in ASV,” 2020, doi: 10.21437/odyssey.2020-20.

[52] J. Li, M. Sun, X. Zhang, and Y. Wang, “Joint Decision of Anti-Spoofing and Automatic Speaker Verification by Multi-Task Learning With Contrastive Loss,” Ieee Access, 2020, doi: 10.1109/access.2020.2964048.

[53] A. Kumar, D. Paul, M. Pal, M. Sahidullah, and G. Saha, “Speech Frame Selection for Spoofing Detection With an Application to Partially Spoofed Audio-Data,” Int J Speech Technol, 2021, doi: 10.1007/s10772-020-09785-w.

[54] L. Wei, Y. Long, H. Wei, and Y. Li, “New Acoustic Features for Synthetic and Replay Spoofing Attack Detection,” Symmetry (Basel), 2022, doi: 10.3390/sym14020274.

[55] W. Cai, H. Wu, D. Cai, and M. Li, “The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation, Feature Representation, Classification, and Fusion,” 2019, doi: 10.21437/interspeech.2019-1230.

[56] T. Arif, A. Javed, M. Alhameed, F. Jeribi, and A. Tahir, “Voice Spoofing Countermeasure for Logical Access Attacks Detection,” Ieee Access, 2021, doi: 10.1109/access.2021.3133134.

[57] Y. Zhao, R. Togneri, and V. Sreeram, “Multi-Task Learning-Based Spoofing-Robust Automatic Speaker Verification System,” Circuits Syst Signal Process, 2022, doi: 10.1007/s00034-022-01974-z.

[58] X. Dang, Z. Zhao, and N. Wu, “Research on Speech Playback Spoof Detection Based on ASV Spoof 2021,” in Proceedings of 2024 International Conference on New Trends in Computational Intelligence, NTCI 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 538–543. doi: 10.1109/NTCI64025.2024.10776128.

[59] J. Yamagishi et al., “ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection,” no. September, pp. 47–54, 2021, doi: 10.21437/asvspoof.2021-8.