Syllable-Based Javanese Speech Recognition Using MFCC and CNNs: Noise Impact Evaluation

Hermanto Hermanto; Tjong Wan Sen

doi:10.15408/jti.v18i1.41067

Syllable-Based Javanese Speech Recognition Using MFCC and CNNs: Noise Impact Evaluation

Hermanto Hermanto, Tjong Wan Sen

Abstract

Javanese, a regional language in Indonesia spoken by over 100 million people, is classified as a low-resource language, presenting significant challenges in the development of effective speech recognition systems due to limited linguistic resources and data. Furthermore, the presence of noise is a significant factor that impacts the performance of speech recognition systems. This study aims to develop a speech recognition model for the Javanese language, focusing on a syllable-based approach using Mel Frequency Cepstral Coefficients (MFCC) for audio feature extraction and Convolutional Neural Networks (CNNs) methods for classification. Additionally, it will analyze how different types of colored noise: white gaussian, pink, and brown, when added to the audio, impact the model's accuracy. The results showed that the proposed method reached a peak accuracy of 81% when tested on the original audio (audio without any synthetic noise added). Moreover, in noisy audio, model accuracy improves as noise levels decrease. Interestingly, with brown noise at a 20 dB SNR, the model's accuracy slightly increases to 83%, representing a 2.47% improvement over the original audio. These results demonstrate that the proposed syllable-based method is a promising approach for real-world applications in Javanese speech recognition, and the slight accuracy improvement in noisy conditions suggests potential regularization effects

Keywords

Javanese Speech recognition; MFCC; Convolutional Neural Networks (CNN);color of noise

Full Text:

PDF

References

A. Rahman, M. M. Kabir, M. F. Mridha, M. Alatiyyah, H. F. Alhasson and S. S. Alharbi, "Arabic Speech Recognition: Advancement and Challenges," IEEE Access, vol. 12, pp. 39689 -39716, 2024.

M. I. F. Rifqi Adiwidjaja, "End-to-end indonesian speech recognition with convolutional and gated recurrent units," in IOP Publishing, Medan, Sumatera Utara, 2019.

A. Mahmood and U. Köse, "Speech recognition based on convolutional neural networks and MFCC algorithm," Advances in Artificial Intelligence Research (AAIR), vol. 1, no. 1, pp. 6 -12, 2021.

A. B. Nassif, I. Shahin, I. Attili, M. Azzeh and K. Shaalan, "Speech Recognition Using Deep Neural Networks: A Systematic Review," IEEE Access, vol. 7, pp. 19144 - 19165, 2019.

K. Nugroho, E. Noersasongko, Purwanto, Muljono and H. A. Santoso, "Javanese Gender Speech Recognition Using Deep Learning And Singular Value Decomposition," in IEEE, Semarang, Indonesia, 2019.

F. Arifin, A. S. Priambodo, A. Nasuha, A. Winursito and T. S. Gunawan, "Development of Javanese Speech Emotion Database (Java-SED)," Indonesian Journal of Electrical Engineering and Informatics (IJEEI), vol. 10, pp. 584 - 591, 2022.

A. Nursetyo and D. R. I. M. Setiadi, "LatAksLate: Javanese Script Translator based on Indonesian Speech Recognition using Sphinx-4 and Google API," in IEEE, Yogyakarta, 2018.

S. Novitasari, A. Tjandra, S. Sakti and S. Nakamura, "Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis," in European Language Resources association, Marseille, France, 2020.

A. M. Warohma, P. Kurniasari, S. Dwijayanti, Irmawan and B. Y. Suprapto, "Identification of Regional Dialects Using Mel Frequency Cepstral Coefficients (MFCCs) and Neural Network," in IEEE, Semarang, Indonesia, 2018.

T. Zhang, Y. Shao, Y. Wu, Y. Geng and L. Fan, "An overview of speech endpoint detection algorithms," Applied Acoustics, vol. 160, pp. 1 -15, 2020.

A. Nosan and S. Sitjongsataporn, "Speech Recognition Approach using Descend-Delta-Mean and MFCC Algorithm," in IEEE, Pattaya, Thailand, 2019.

Maruf, M. Raffael, Faruque, M. Omar, M. Salman, N. Nelima, M. Muhtasim and M. Pervez, "Effects of Noise on RASTA-PLP and MFCC based Bangla ASR Using CNN," in IEEE, Dhaka, Bangladesh., 2020.

J. Cao, M. Cao, J. Wang, C. Yin, D. Wang and P.-P. Vidal, "Urban noise recognition with convolutional neural network," Multimedia Tools and Applications, 2018.

F. Amelia and D. Gunawan, "DWT-MFCC Method for Speaker Recognition System with Noise," in IEEE, Sarawak, Malaysia, 2019.

R. Hidayat, A. Bejo, S. Sumaryono and A. Winursito, "Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System," in IEEE, Bali, Indonesia, 2018.

T. B. e. al, "Effect of Noise and Model Complexity on Detection of Amyotrophic Lateral Sclerosis and Parkinson’s Disease Using Pitch and MFCC," in IEEE, Toronto, Canada, 2021.

H. M. S. Naing, Y. Miyanaga, R. Hidayat and B. Winduratna, "Filterbank Analysis of MFCC Feature Extraction in Robust Children Speech Recognition," in IEEE, Quezon City, Philippines, 2019.

N. Wankhede and S. Wagh, "Enhancing Biometric Speaker Recognition Through MFCC Feature Extraction and Polar Codes for Remote Application," IEEE Access, vol. 11, pp. 133921-133930, 2023.

M. M. Azmy, "Gender of Fetus Identification Using Modified Mel-Frequency Cepstral Coefficients Based on Fractional Discrete Cosine Transform," IEEE Access, vol. 12, pp. 48158-48164, 2024.

M. Barhoush, A. Hallawa and A. Schmeink, "Speaker identification and localization using shuffled MFCC features and deep learning," Int J Speech Technol, vol. 26, p. 185–196, 2023.

C. Jiang, L. Ba, X. Tang and D. Wen, "Speaker Verification Using IMNMF and MFCC with Feature Warping Under Noisy Environment," in IEEE, Xi'an, China, 2018.

L. Lin and L. Tan, "Multi-distributed speech emotion recognition based on Mel frequency cepstogram and parameter transfer," Chinese Journal of Electronics, vol. 31, pp. 155-167, 2022.

M. R. Falahzadeh, E. Z. Farsa, A. Harimi, A. Ahmadi and A. Abraham, "3D Convolutional Neural Network for Speech Emotion Recognition With Its Realization on Intel CPU and NVIDIA GPU," IEEE Access, vol. 10, pp. 112460-112471, 2022.

Y. Li, C. Baidoo, T. Cai and G. A. Kusi, "Speech Emotion Recognition Using 1D CNN with No Attention," in IEEE, Phuket, Thailand, 2020.

R. Y. Cherif, A. Moussaoui, N. Frahta and M. Berrimi, "Effective speech emotion recognition using deep learning approaches for Algerian dialect," in IEEE, Taif, Saudi Arabia, 2021.

Z. Chen, J. Liu and Y. Zhang, "Research on an Improved CNN Speech Recognition System Based on Hidden Markov Model," in IEEE, Vientiane, Laos, 2020.

Y. Lin, D. Guo, J. Zhang, Z. Chen and B. Yang, "A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, pp. 3608-3620, 2021.

J. Banjara, K. R. Mishra, J. Rathi, K. Karki and S. Shakya, "Nepali Speech Recognition using CNN and Sequence Models," in IEEE, Hyderabad, India, 2020.

S. Yang, M. Lee and H. Kim, "Deep Learning-based Syllable Recognition Framework for Korean Children," in IEEE, Jeju Island, Korea (south), 2021.

Z. K. Abdul and A. K. Al-Talabani, "Mel Frequency Cepstral Coefficient and its Applications: A Review," IEEE Access, vol. 10, pp. 122136-122158, 2022.

R. W. Schafer and L. R. Rabiner, Digital Processing of Speech Signals, Prentice-Hall, 1978.

DOI: https://doi.org/10.15408/jti.v18i1.41067

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id

Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

JTI Visitor Counter: View JTI Stats

Username
Password
Remember me