RNN-driven integration of spatial, temporal, features for Indian sign language recognition and video captioning
This paper presents a novel model that integrates spatial features from residual blocks and temporal features from FFT, alongside a sophisticated RNN architecture comprising BiLSTM, gated recurrent units (GRU) layers, and multi-head attention. Achieving nearly 99% accuracy on both WLASL and INCLUDE datasets, this model outperforms standard CNN pretrained models in feature extraction. Notably, the BiLSTM and GRU combination proves superior to other combinations such as LSTM and GRU. The BLEU score analysis further validates the model's efficacy, with scores of 0.51 and 0.54 on the WLASL and INCLUDE datasets, respectively. These results affirm the model's proficiency in capturing intricate spatial and temporal nuances inherent in sign language gestures, enhancing accessibility and communication for the deaf and hard-of-hearing communities. The comparison highlights the superiority of this paper's proposed model over standard approaches, emphasizing the significance of the integrated architecture. Continued refinement and optimization hold promise for further augmenting the model's performance and applicability in real-world scenarios, contributing to inclusive communication environments.
BiLSTM; BLEU score evaluation; FFT-based feature extraction; Residual blocks; Sign language recognition
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v38.i2.pp821-829
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).