SignBridge: A Lightweight Framework For Real-Time American Sign Language Recognition Using MediaPipe And MobileNetV2

29 Apr

Authors: Vishal Singh, Sudipta Sarkar, Uddesh Raj

Abstract: American Sign Language (ASL) serves as a vital communication tool for the deaf and hard-of- hearing community, yet barriers persist due to limited familiarity among the general population. SignBridge addresses this challenge by introducing a lightweight, real-time ASL recognition system that leverages hand pose estimation with MediaPipe for hand landmark extraction and a customized MobileNetV2-based convolutional neural network (CNN) for gesture classification, emphasizing edge computing for efficient deployment. The pipeline processes RGB video input to detect and classify static ASL alphabets (A-Z) and numbers (0-9), achieving high accuracy while maintaining computational efficiency suitable for edge devices. The methodology begins with frame capture from a standard webcam, followed by MediaPipe Hands detection to extract 21 key landmarks per hand, forming a compact 42-dimensional feature vector for both hands. These features are fed into a lightweight MobileNetV2 variant, fine-tuned for ASL with knowledge distillation to reduce parameters by 40% compared to standard models. Training utilizes an 80/20 train/validation split on a 27-class WLASL subset for static ASL alphabets, employing data augmentation techniques like rotation and scaling invariance to handle real-world variations. Experimental evaluation on WLASL and a custom ASL dataset demonstrates 96% top-1 accuracy for static gestures, with real-time performance at 35 FPS on consumer-grade hardware (Intel i5 CPU, no GPU). Ablation studies confirm the efficacy of MediaPipe integration, outperforming baselines like VGG-16 by 15% in speed without accuracy loss. Comparisons with state-of-the-art methods, such as Transformer-based Vision Transformer (ViT) models achieving 92% accuracy but at 15 FPS due to higher compute (Karna et al., 2021), and MobileNet for ASL alphabets achieving ~99.93% accuracy (Kandukuri et al., 2023), highlight SignBridge's novelty in balancing accuracy and latency for edge efficiency. This work contributes to accessible communication by enabling seamless ASL-to-text translation in applications like video calls and educational tools. Implications include broader societal inclusion, with potential extensions to dynamic gestures and multilingual sign languages. Limitations such as sensitivity to lighting are discussed, alongside future directions for multimodal integration. Reproducibility is ensured through open-source code and dataset details, promoting further advancements in inclusive technology. (198 words).

DOI: https://doi.org/10.5281/zenodo.19884687