About This Project - By: Lukas Wong (Updated 2025)
This project is a real-time American Sign Language (ASL) recognition system. That I trained using Machine Learning (NOT AI). It uses your webcam to capture hand gestures, processes them with computer vision, and classifies them into ASL letters.
This is a simple passion project that I made on the side, where I drew inspiration for this project based on someone I knew in university who was learning ASL at the time.
NOTE: Total costs are only $3/year for the DNS & $5/month for the AWS Lightsail service *prices in USD*
Current Limitations
- Alphabet Focus: The model is only trained to recognize single hand letters (A–Z), not full ASL words or phrases.
- Simple Gestures: Many signs in ASL involve motion or two hands, which this model does not yet support.
- Accuracy Constraints: Performance may vary depending on lighting, camera quality, and how clearly the sign is made.
This project is a proof-of-concept for detecting simple alphabet letters, not a complete ASL translation tool... yet...
How It Works
- Webcam Input: The browser accesses your camera securely using
getUserMedia. - Hand Landmark Detection: MediaPipe extracts 21 keypoints (x, y, z) per frame.
- Feature Vector: These 63 values form the input to the recognition model.
- Model Inference: A TensorFlow.js model classifies the vector into 29 classes (A–Z plus space, delete, nothing).
- Output: Predictions are displayed and assembled into sentences, optionally with speech synthesis.
How the Model Was Trained
The model was trained in Python using Keras and MediaPipe landmarks. Each frame was processed to extract 21 hand landmarks, flattened into 63 features. The architecture included three dense layers (256, 128, 64 neurons), each with batch normalization and dropout to reduce overfitting. Training used the Adam optimizer and categorical cross-entropy loss.
After ~41 epochs, the model achieved ~98% test accuracy and ~95% real-time recognition accuracy.
The original Execution_Code.py handled webcam input with OpenCV, extracted landmarks with MediaPipe,
and provided predictions + speech output. That logic was later ported to TensorFlow.js for browser inference.
Deployment
The app runs fully in-browser using TensorFlow.js, MediaPipe, and the Web Speech API.
It is hosted on AWS Lightsail, with AWS Route 53 used for DNS management
to connect my custom domain lukas-wong-asl.click.
HTTPS certificates were issued with Certbot and Let's Encrypt for secure webcam access.
Technologies Used
- TensorFlow / Keras – training the original model in Python
- OpenCV (Python) – preprocessing and webcam streaming during training
- MediaPipe (Python + JS) – extracting 21 hand landmarks
- TensorFlow.js – browser-based model inference
- MediaPipe Tasks Vision (HandLandmarker) – real-time hand tracking in the browser
- JavaScript (ES Modules) – frontend logic
- HTML / CSS – user interface
- Nginx – serving static site and models
- AWS Lightsail – hosting the application
- AWS Route 53 – domain & DNS management
- Certbot + Let’s Encrypt – HTTPS certificates