Impact of Wavelet Denoising on LSTM-Based Greeting Sentence Recognition Using the IndSpeech Teldialog SVCR Dataset

Shabira Zhillan; Luh Kesuma Wardhani; Nenny Anggraini; Nashrul Hakiem; Imam Marzuki Shofi

doi:10.15408/jti.v19i1.49040

Authors

Shabira Zhillan Informatics Engineering, Faculty of Science and Technology, State Islamic University Syarif Hidayatullah Jakarta
Luh Kesuma Wardhani State Islamic University Syarif Hidayatullah Jakarta
Nenny Anggraini Informatics Engineering, Faculty of Science and Technology, State Islamic University Syarif Hidayatullah Jakarta
Nashrul Hakiem Informatics Engineering, Faculty of Science and Technology, State Islamic University Syarif Hidayatullah Jakarta
Imam Marzuki Shofi Informatics Engineering, Faculty of Science and Technology, State Islamic University Syarif Hidayatullah Jakarta

DOI:

https://doi.org/10.15408/jti.v19i1.49040

Keywords:

Denoising, Wavelet Transform, Long Short-Term Memory, Speech Recognition, MFCC

Abstract

Speech signals play a crucial role in human communication, particularly in speech recognition systems. However, speech recognition performance is often compromised by noise in the audio signal. This study aims to examine the effect of wavelet denoising technique on greeting sentence data containing artificial white noise before performing speech recognition using Long Short-Term Memory (LSTM). Mel Frequency Cepstral Coefficient (MFCC) is used as speech feature extraction. The results show that speech recognition accuracy reaches 90% on clean data. Accuracy drops to 51% when tested on data with noise, indicating a significant decrease of 39 percentage points. After applying the wavelet denoising method, accuracy improved using the two best parameter combinations. The combination with the highest SNR value resulted in an improvement of 18 percentage points, while the combination with the highest PESQ value resulted in an improvement of 13 percentage points. These findings indicate that the wavelet denoising method is capable of improving the performance of LSTM-based speech recognition in noisy environments.

References

[1] S. Hidayat, A. S. Anas, S. Agrippina, A. Yusuf, and M. Tajuddin, “Sistem pengenalan pembicara dengan metode wavelet-MFCC dan pengklasifikasi hidden Markov models (HMM),” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 1, pp. 119–126, 2021, doi: 10.25126/jtiik.202183284.

[2] O. Barkovska and A. Havrashenko, “Research of the impact of noise reduction methods on the quality of audio signal recovery,” Інформаційно-керуючі системи на залізничному транспорті, vol. 29, no. 3, pp. 57–65, 2024, doi: 10.18664/ikszt.v29i3.313606.

[3] R. Dwi, P. Rahmasari, and N. Af, “Simulasi penghilangan noise pada sinyal suara menggunakan metode fast fourier transfrom (simulation of noise removal in sound signals by using fast fourier transform method),” vol. 1, no. 1, pp. 1–7, 2024, doi: 10.29303/semeton.v1i1.203.

[4] S. J. Lee and H. Y. Kwon, “A preprocessing strategy for denoising of speech data based on speech segment detection,” Appl. Sci., vol. 10, no. 20, pp. 1–24, 2020, doi: 10.3390/app10207385.

[5] K. Hulliyah, A. H. Setianingrum, and W. Santoso, “Sinyal elektroensefalografi untuk deteksi emosi saat mendengar stimulus pembacaan Al-Quran menggunakan wavelet transform,” Technomedia J., vol. 8, no. 2SP, pp. 175–188, 2023, doi: 10.33050/tmj.v8i2sp.2060.

[6] Yohannes and R. Wijaya, “Klasifikasi makna tangisan bayi menggunakan CNN,” J. Tek. Inform. dan Sist. Inf., vol. 8, no. 2, pp. 599–610, 2021, doi: 10.35957/jatisi.v8i2.470.

[7] S. B. Mulia, N. Wisma Nugraha, M. H. Robbani, T. Otomasi, M. & Mekatronika, and P. Manufaktur Bandung, “Implementasi machine learning untuk identifikasi orang batuk/bersin,” J. Energy Electr. Eng., vol. 81, no. 2, pp. 81–86, 2023, doi: 10.37058/jeee.v4i2.6836.

[8] P. Aliya Nabila, S. Soim, and A. Silvia Handayani, “Klasifikasi kondisi kendaraan berpotensi kecelakaan berbasis android menggunakan long short term memory,” J. Media Inform. Budidarma, vol. 8, no. 1, pp. 30–40, 2024, doi: 10.30865/mib.v8i1.7005.

[9] S. Sakti, P. Hutagaol, A. A. Arman, and S. Nakamura, “Indonesian speech recognition for hearing and speaking impaired people,” 8th Int. Conf. Spok. Lang. Process. ICSLP 2004, pp. 1037–1040, 2004, doi: 10.21437/interspeech.2004-366.

[10] N. Aini Lailla Asri, R. Ibnu Adam, and B. Arif Dermawan, “Speech recognition untuk klasifikasi pengucapan nama hewan dalam bahasa sunda menggunakan metode long-short term memory,” JATI (Jurnal Mhs. Tek. Inform., vol. 7, no. 2, pp. 1242–1247, 2023, doi: 10.36040/jati.v7i2.6744.

[11] L. Huang, J. Yan, S. Cai, R. Guo, H. Yan, and Y. Wang, “Automated segmentation of the systolic and diastolic phases in wrist pulse signal using long short-term memory network,” Biomed Res. Int., vol. 2022, p. 9, 2022, doi: 10.1155/2022/2766321.

[12] S. N. Endah, R. Rismiyati, P. S. Sasongko, and A. P. F. Noiborhu, “Indonesian continuous speech recognition optimization with convolution bidirectional long short-term memory architecture,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 23, no. 3, p. 807, 2025, doi: 10.12928/telkomnika.v23i3.24994.

[13] O. Spjuth, J. Frid, and A. Hellander, “The machine learning life cycle and the cloud: implications for drug discovery,” Expert Opin. Drug Discov., vol. 16, no. 9, pp. 1071–1079, 2021, doi: 10.1080/17460441.2021.1932812.

[14] S. Ranjan, R. Chakraborty, and S. K. Kopparapu, “Reinforcement learning based data augmentation for noise robust speech emotion recognition,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, pp. 1040–1044, 2024, doi: 10.21437/Interspeech.2024-921.

[15] S. K. Ghosh and R. N. Ponnalagu, “Investigation of discrete wavelet transform domain optimal parametric approach for denoising of phonocardiogram signal,” J. Mech. Med. Biol., vol. 22, no. 3, p. 19, 2022, doi: 10.1142/S0219519422500464.

[16] X. Li, K. Liao, G. He, and J. Zhao, “Research on improved wavelet threshold denoising method for non-contact force and magnetic signals,” Electron., vol. 12, no. 5, p. 1244, 2023, doi: 10.3390/electronics12051244.