Continuous Sign Language Recognition Using Combination of Two Stream 3DCNN and SubUNet

Haryo Pramanto, Suharjito Suharjito

Abstract


Research on sign language recognition using deep learning has been carried out by many researchers in the field of computer science but there are still obstacles in achieving the expected level of accuracy. Not a few researchers who want to do research for Continuous Sign Language Recognition but are trapped into research for Isolated Sign Language Recognition. The purpose of this study was to find the best method for performing Continuous Sign Language Recognition using Deep Learning. The 2014 RWTH-PHOENIX-Weather dataset was used in this study. The dataset was obtained from a literature study conducted to find datasets that are commonly used in Continuous Sign Language Recognition research. The dataset is used to develop the proposed method. The combination of 3DCNN, LSTM and CTC models is used to form part of the proposed method architecture. The collected dataset is also converted into an Optical Flow frame sequence to be used as Two Stream input along with the original RGB frame sequence. Word Error Rate on the prediction results is used to review the performance of the developed method. Through this research, the best achieved Word Error Rate is 94.1% using the C3D BLSTM CTC model with spatio stream input.


Keywords


Continuous Sign Language Recognition, Two Stream Mode, 3DCNN, LSTM, CTC

Full Text:

PDF

References


REFERENCES

M. Tomasello, “The ultra-social animal,” Eur. J. Soc. Psychol., vol. 44, no. 3, pp. 187–194, 2014, doi: 10.1002/ejsp.2015.

S. E. Jones and C. D. LeBaron, “Research on the Relationship between Verbal and Nonverbal Communication: Emerging Integrations,” J. Commun., vol. 52, no. 3, pp. 499–521, 2002, doi: 10.1111/j.1460-2466.2002.tb02559.x.

M. Elmahgiubi, M. Ennajar, N. Drawil, and M. S. Elbuni, “Sign language translator and gesture recognition,” GSCIT 2015 - Glob. Summit Comput. Inf. Technol. - Proc., 2015, doi: 10.1109/GSCIT.2015.7353332.

A. Kuznetsova, L. Leal-Taixe, and B. Rosenhahn, “Real-Time Sign Language Recognition Using a Consumer Depth Camera,” 2013 IEEE Int. Conf. Comput. Vis. Work., pp. 83–90, 2013, doi: 10.1109/ICCVW.2013.18.

M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and sign language recognition techniques,” Int. J. Mach. Learn. Cybern., vol. 10, no. 1, pp. 131–153, 2019, doi: 10.1007/s13042-017-0705-5.

A. Shrestha and A. Mahmood, “Review of deep learning algorithms and architectures,” IEEE Access, vol. 7, pp. 53040–53065, 2019, doi: 10.1109/ACCESS.2019.2912200.

A. Munappy, J. Bosch, H. H. Olsson, A. Arpteg, and B. Brinne, “Data Management Challenges for Deep Learning,” Proc. - 45th Euromicro Conf. Softw. Eng. Adv. Appl. SEAA 2019, pp. 140–147, 2019, doi: 10.1109/SEAA.2019.00030.

N. Aloysius and M. Geetha, “Understanding vision-based continuous sign language recognition,” Multimed. Tools Appl., vol. 79, no. 31–32, pp. 22177–22209, 2020, doi: 10.1007/s11042-020-08961-z.

L. Yusnita, R. Roestam, and R. B. Wahyu, “Implementation of Real-Time Static Hand,” CommIT (Communication & Information Technology), vol. 11, no. 2. pp. 85–91, 2017.

R. Hartanto, A. Susanto, and P. I. Santosa, “Real time static hand gesture recognition system prototype for Indonesian sign language,” in Proceedings - 2014 6th International Conference on Information Technology and Electrical Engineering: Leveraging Research and Technology Through University-Industry Collaboration, ICITEE 2014, 2014, no. January 2015, doi: 10.1109/ICITEED.2014.7007911.

D. Guo, W. Zhou, H. Li, and M. Wang, “Online early-late fusion based on adaptive HMM for sign language recognition,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 14, no. 1, pp. 1–18, 2017, doi: 10.1145/3152121.

Suharjito, H. Gunawan, N. Thiracitta, and A. Nugroho, “Sign Language Recognition Using Modified Convolutional Neural Network Model,” 1st 2018 Indones. Assoc. Pattern Recognit. Int. Conf. Ina. 2018 - Proc., no. July 2019, pp. 1–5, 2019, doi: 10.1109/INAPR.2018.8627014.

O. Koller, S. Zargaran, H. Ney, and R. Bowden, “Deep sign: Hybrid CNN-HMM for continuous sign language recognition,” in British Machine Vision Conference 2016, BMVC 2016, 2016, vol. 2016-Septe, pp. 136.1-136.12, doi: 10.5244/C.30.136.

R. Cui, H. Liu, and C. Zhang, “Recurrent convolutional neural networks for continuous sign language recognition by staged optimization,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 1610–1618, 2017, doi: 10.1109/CVPR.2017.175.

J. Huang, W. Zhou, Q. Zhang, H. Li, and W. Li, “Video-based sign language recognition without temporal segmentation,” 32nd AAAI Conf. Artif. Intell. AAAI 2018, pp. 2257–2264, 2018.

J. Singha and K. Das, “Indian Sign Language Recognition Using Eigen Value Weighted Euclidean Distance Based Classification Technique,” arXiv Prepr. arXiv1303.0634, vol. 4, no. 2, pp. 188–195, 2013, [Online]. Available: http://arxiv.org/abs/1303.0634.

J. Forster, C. Schmidt, O. Koller, M. Bellgardt, and H. Ney, “Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-Weather,” Proc. 9th Int. Conf. Lang. Resour. Eval. Lr. 2014, no. May, pp. 1911–1916, 2014.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 4489–4497, 2015, doi: 10.1109/ICCV.2015.510.

Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. NAACL HLT 2016 - Proc. Conf., pp. 1480–1489, 2016, doi: 10.18653/v1/n16-1174.

Q. Zhang, G. Hua, W. Liu, Z. Liu, and Z. Zhang, “Auxiliary training information assisted visual recognition,” IPSJ Trans. Comput. Vis. Appl., vol. 7, pp. 138–150, 2015, doi: 10.2197/ipsjtcva.7.138.

N. C. Camgoz, S. Hadfield, O. Koller, and R. Bowden, “SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-Octob, pp. 3075–3084, 2017, doi: 10.1109/ICCV.2017.332.

A. Krizhevsky, I. Sutskever, and G. E, Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” 2012, [Online]. Available: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.

R. Cui, H. Liu, and C. Zhang, “A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training,” IEEE Trans. Multimed., vol. 21, no. 7, pp. 1880–1891, 2019, doi: 10.1109/TMM.2018.2889563.

O. Koller, J. Forster, and H. Ney, “Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers,” Comput. Vis. Image Underst., vol. 141, pp. 108–125, 2015, doi: 10.1016/j.cviu.2015.09.013.

L. Pigou, A. van den Oord, S. Dieleman, M. Van Herreweghe, and J. Dambre, “Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video,” Int. J. Comput. Vis., vol. 126, no. 2–4, pp. 430–439, 2018, doi: 10.1007/s11263-016-0957-7.

V. Athitsos et al., “The American Sign Language Lexicon Video Dataset,” 2008 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. CVPR Work., 2008, doi: 10.1109/CVPRW.2008.4563181.

M. Zikky and Z. F. Akbar, “Kamus Sistem Isyarat Bahasa Indonesia (KASIBI) dengan Voice Recognition sebagai Pendukung Belajar Bahasa Isyarat Berbasis Android,” JST (Jurnal Sains Ter., vol. 5, no. 2, 2019, doi: 10.32487/jst.v5i2.732.

F. Ronchetti, F. Quiroga, and L. Lanzarini, “LSA64 : An Argentinian Sign Language Dataset,” Congr. Argentino Ciencias la Comput., pp. 794–803, 2016.

R. Prabhavalkar et al., “Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, no. Cd, pp. 4839–4843, 2018, doi: 10.1109/ICASSP.2018.8461809.

J. Ortiz Laguna, A. G. Olaya, and D. Borrajo, “A dynamic sliding window approach for activity recognition,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 6787 LNCS, pp. 219–230, 2011, doi: 10.1007/978-3-642-22362-4_19.

Y. Ye, Y. Tian, M. Huenerfauth, and J. Liu, “Recognizing american sign language gestures from within continuous videos,” IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., vol. 2018-June, pp. 2145–2154, 2018, doi: 10.1109/CVPRW.2018.00280.

J. Sánchez Pérez, E. Meinhardt-Llopis, and G. Facciolo, “TV-L1 Optical Flow Estimation,” Image Process. Line, vol. 3, pp. 137–150, 2013, doi: 10.5201/ipol.2013.26.

T. Pock, M. Urschler, C. Zach, R. Beichel, and H. Bischof, “A duality based algorithm for TV-L1-optical-flow image registration,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 4792 LNCS, no. PART 2, pp. 511–518, 2007, doi: 10.1007/978-3-540-75759-7_62.

G. Hacohen and D. Weinshall, “On the power of curriculum learning in training deep networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 4483–4496, 2019.

J. Carreira and A. Zisserman, “Quo Vadis, action recognition? A new model and the kinetics dataset,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 4724–4733, 2017, doi: 10.1109/CVPR.2017.502.

O. Koller, S. Zargaran, H. Ney, and R. Bowden, “Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs,” Int. J. Comput. Vis., vol. 126, no. 12, pp. 1311–1325, 2018, doi: 10.1007/s11263-018-1121-3.




DOI: https://doi.org/10.15408/jti.v16i2.27030 Abstract - 0 PDF - 0

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Haryo Pramanto, Suharjito Suharjito

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3rd Floor, Dept. of Informatics, Faculty of Science and Technology, UIN Syarif Hidayatullah Jakarta
Jl. Ir. H. Juanda No.95, Cempaka Putih, Ciputat Timur.
Kota Tangerang Selatan, Banten 15412
Tlp/Fax: +62 21 74019 25/ +62 749 3315
Handphone: +62 8128947537
E-mail: jurnal-ti@apps.uinjkt.ac.id


Creative Commons Licence
Jurnal Teknik Informatika by Prodi Teknik Informatika Universitas Islam Negeri Syarif Hidayatullah Jakarta is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://journal.uinjkt.ac.id/index.php/ti.

JTI Visitor Counter: View JTI Stats

 Flag Counter