End-to-end multiple modals deep learning system for hand posture recognition

Huong-Giang Doan, Ngoc-Trung Nguyen


Multi-modal or multi-view dataset that was captured from various resources (e.g. RGB and Depth) of a subject at the same time. Combination between different cues has still faced to many challenges as unique data and complementary in-formation. In adition, the proposed method for multiple modalities recognition consists of discrete blocks, such as: extract features for separative data flows, combine of features, and classify gestures. To address the challenges, we pro-posed two novel end-to-end hand posture recognition frameworks, which are integrated all steps into a convolution neuronal network (CNN) system from capturing various types of cues (RGB and Depth images) to classify hand ges-ture labels. Both frameworks use the Resnet50 backbone that was pretrained by ImageNet dataset. We proposed a novel end-to-end multi-modal frameworks, which are named attention convolution module (ACM) and gated concatenation module (GCM). Both of them are deployed, evaluated and compared on vari-ous multi-modalities hand posture datasets. Experimental results show that our proposed method outperforms with others state-of-the-art techniques (SOTA) methods.


Deep learning; End-to-end system; Hand posture recognition; Human-machine interaction; Multi-modality

Full Text:


DOI: http://doi.org/10.11591/ijeecs.v27.i1.pp214-221


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics