Dynamic Hand Gesture Detection using CNN-based Keypoint Estimation

International Journal of Electronics and Communication Engineering |
© 2025 by SSRG - IJECE Journal |
Volume 12 Issue 4 |
Year of Publication : 2025 |
Authors : Rameez Shamalik, Shital pawar, D.B. Jadhav, Seema hadke, Kanchan Mahajan |
How to Cite?
Rameez Shamalik, Shital pawar, D.B. Jadhav, Seema hadke, Kanchan Mahajan, "Dynamic Hand Gesture Detection using CNN-based Keypoint Estimation," SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 4, pp. 142-147, 2025. Crossref, https://doi.org/10.14445/23488549/IJECE-V12I4P113
Abstract:
Accurate and real-time hand gesture detection is crucial for advancing Human-Computer Interaction (HCI) applications. However, conventional methods often struggle with dynamic gestures due to factors such as motion blur, varying lighting conditions, and complex hand shapes. This research delves into developing a robust CNN-based hand gesture detection system to overcome these limitations. Trained and tested on real-life static and dynamic gesture datasets, the proposed model exhibits significant accuracy improvements over existing methods, achieving average precisions of 92.87% and 95.17%, respectively. This research presents a novel multi-layered CNN for accurate 3D hand poses estimation in real-time. By leveraging the power of CNNs and incorporating 3D key points, the proposed model achieves significant accuracy improvements over existing methods while maintaining real-time performance. This opens up new possibilities for hand gesture-based HCI applications, paving the way for more natural and intuitive interactions between humans and computers.
Keywords:
CNN, Gesture detection, Skeletal representation, Video processing, 3D estimation.
References:
[1] H. Pallab Jyoti Dutta et al., “Semantic Segmentation Based Hand Gesture Recognition Using Deep Neural Networks,” National Conference on Communications, Kharagpur, India, pp. 1-6, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Gu Lingyun, Zhang Lin, and Wang Zhaokui, “Hierarchical Attention-Based Astronaut Gesture Recognition: A Dataset and CNN Model,” IEEE Access, vol. 8, pp. 68787-68798, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Olena Vynokurova, and Dmytro Peleshko, “Hybrid Multidimensional Deep Convolutional Neural Network for Multimodal Fusion,” IEEE Third International Conference on Data Stream Mining & Processing, Lviv, Ukraine, pp. 131-135, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Nabeel M. Mirza et al., “Static Hand Gesture Angle Recognition via Aggregated Channel Features (ACF) Detector,” Signal Processing, vol. 39, no. 3, pp. 939-944, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Eric Fujiwara, Murilo Ferreira Marques dos Santos, and Carlos K. Suzuki, “Flexible Optical Fiber Bending Transducer for Application in Glove-Based Sensors,” IEEE Sensors Journal, vol. 14, no. 10, pp. 3631-3636, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Beom Jun Jo, Seok-Kyoo Kim, and SeongKi Kim, “Enhancing Virtual and Augmented Reality Interactions with a MediaPipe-Based Hand Gesture Recognition User Interface,” Information Systems Engineering, vol. 28, no. 3, pp. 633-638, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Umair Haroon et al., “A Multi-Stream Sequence Learning Framework for Human Interaction Recognition,” IEEE Transactions on Human-Machine Systems, vol. 52, no. 3, pp. 435-444, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Noor Fadel, and Emad I. Abdul Kareem, “Detecting Hand Gestures Using Machine Learning Techniques,” Information Systems Engineering, vol. 27, no. 6, pp. 957-965, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Danilo Avola et al., “2-D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs,” IEEE Transactions on Multimedia, vol. 22, no. 10, pp. 2481-2496, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Guillaume Devineau et al., “Deep Learning for Hand Gesture Recognition on Skeletal Data,” 13th IEEE International Conference on Automatic Face & Gesture Recognition, Xi'an, China, pp. 106-113, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Yuqing Peng et al., “Dynamic Gesture Recognition Based on Feature Fusion Network and Variant ConvLSTM,” IET Image Processing, vol. 14, no. 11, pp. 2480-2486, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Okan Köpüklü et al., “Online Dynamic Hand Gesture Recognition Including Efficiency Analysis,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 2, no. 2, pp. 85-97, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[13] Lixiong Lin et al., “Efficient and High-Quality Monocular Depth Estimation via Gated Multi-Scale Network,” IEEE Access, vol. 8, pp. 7709-7718, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[14] Wei Liu et al., “SSD: Single Shot MultiBox Detector,” Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, pp. 21-37, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Yifan Zhang et al., “EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition,” IEEE Transactions on Multimedia, vol. 20, no. 5, pp. 1038-1050, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Joanna Materzynska et al., “The Jester Dataset: A Large-Scale Video Dataset of Human Gestures,” IEEE/CVF International Conference on Computer Vision Workshop, Seoul, Korea (South), pp. 2874-2882, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Jean Kossaif et al., “Efficient N-Dimensional Convolutions via Higher-Order Factorization,” arXiv, pp. 1-11, 2019. [Google Scholar] [18] Du Tran et al., “ConvNet Architecture Search for Spatiotemporal Feature Learning,” arXiv, pp. 1-12, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Okan Köpüklü, Neslihan Köse, and Gerhard Rigoll, “Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition,” IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, pp. 2184-21848, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Okan Köpüklü et al., “Resource Efficient 3D Convolutional Neural Networks,” arXiv, pp. 1-10, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Mahdi Abavisani, Hamid Reza Vaezi Joze, and Vishal M. Patel, “Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition With Multimodal Training,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 1165-1174, 2019.
[CrossRef] [Google Scholar] [Publisher Link]