Intelligent Sign Language Interpretation System Using Multi- Modal Deep Learning Architectures

12 Feb

Authors: Samruddhi Vijay Wakalkar, Sanskruti Vijay Wakalkar, Siddhi Nanasaheb Hon, Shravani Kishor Mahale, Gauri Sanjay Lad

Abstract: This project presents a real-time American Sign Language (ASL) recognition system using a standard webcam. Communication between deaf or hard-of-hearing individuals and the hearing community is often limited by the high cost and limited availability of professional interpreters. To address this, the proposed system employs an ensemble deep-learning approach that combines a Convolutional Neural Network (CNN) for hand shape recognition, a Graph Neural Network (GNN) to capture finger and joint relationships, and a Vision Transformer to focus on key visual regions while minimizing background noise. By fusing these complementary models, the system achieves enhanced recognition accuracy. The framework was trained and evaluated on a dataset of approximately 87,000 labeled images covering the complete ASL alphabet along with additional gestures such as space and delete. Experimental results demonstrate an accuracy exceeding 95%, outperforming existing methods. The system supports real-time interaction with an average inference time of about 85 milliseconds per gesture. It is deployed through a browser-based interface and requires no specialized hardware beyond a standard webcam. This solution provides an accessible, low-cost alternative to traditional interpretation services and promotes inclusive communication across educational, healthcare, and public environments.

DOI: https://doi.org/10.5281/zenodo.18619156