17. Wildcard Week
Machine Vision Facial Expression Interaction for Jelamp
Project Title: Facial Expression Recognition for Jelamp Using ESP32-S3 Camera and Edge Impulse
Assignment
The Wildcard Week assignment is to design and produce something with a digital process that incorporates computer-aided design and manufacturing, but is not covered in another assignment. The documentation should explain the requirements that the work meets and include enough information for someone else to reproduce it.
For this week, I chose to explore machine vision and embedded machine learning. I used an ESP32-S3 camera board and Edge Impulse to train a simple facial expression recognition model. The model recognizes basic facial expressions and changes the color of a NeoPixel LED ring based on the detected expression.
This experiment directly supports my final project, Jelamp, a 3-DOF interactive robotic desk lamp inspired by the shape of a giraffe.
What I Made
I made a small interactive machine vision system for Jelamp. The system uses a camera to capture the user's face, runs a trained machine learning model on the ESP32-S3, and changes the NeoPixel LED color based on the detected facial expression.
User face | v ESP32-S3 camera | v Edge Impulse facial expression model | v Expression classification | v NeoPixel LED color response
Example Responses
| Facial Expression | NeoPixel Response |
|---|---|
| Happy | Warm yellow / rainbow animation |
| Neutral | Soft white / calm blue |
| Sad / Surprised | Blue or purple breathing light |
Why I Chose This
Jelamp is designed to be more than a normal desk lamp. It is an expressive robotic lamp that can move, sense, and respond to the user. For Wildcard Week, I wanted to add a visual interaction feature. By using a camera and machine learning model, Jelamp can detect simple facial expressions and respond with light.
This makes the lamp feel more alive and interactive.
How This Meets the Assignment Requirement
This project meets the Wildcard Week requirement because it uses a digital process not covered in my other assignments:
- I used machine vision to capture image data.
- I used Edge Impulse to train a machine learning model.
- I deployed the trained model to an embedded microcontroller.
- I connected the digital classification result to a physical output: NeoPixel LEDs.
- I documented the process so that it can be reproduced.
Data collection -> Model training -> Embedded deployment -> Physical interaction
System Overview
+------------------------+
| User Face |
+-----------+------------+
|
v
+------------------------+
| ESP32-S3 Camera Board |
| Captures image input |
+-----------+------------+
|
v
+------------------------+
| Edge Impulse Model |
| Classifies expression |
+-----------+------------+
|
v
+------------------------+
| ESP32-S3 Logic |
| Chooses LED response |
+-----------+------------+
|
v
+------------------------+
| NeoPixel LED Ring |
| Shows color feedback |
+------------------------+
Materials and Components
| Component | Purpose |
|---|---|
| ESP32-S3 camera board / XIAO ESP32S3 Sense | Camera input and model inference |
| NeoPixel ring / WS2812B LEDs | Programmable LED output |
| USB-C cable | Programming and power |
| Jumper wires | Connections |
| Computer | Edge Impulse training and Arduino programming |
| Arduino IDE | Uploading code |
| Edge Impulse | Dataset, training, and deployment |
Step-by-Step Process
Step 1: Define the Scope
The first step was to decide what kind of expressions the system should recognize. To keep the project manageable, I started with simple expression classes:
Happy Neutral Sad / Surprised
If the model was not accurate enough, I planned to simplify it to two classes:
Happy Not Happy
This helped keep the project realistic and achievable.
Step 2: Set Up the ESP32-S3 Camera
I connected the ESP32-S3 camera board to my computer and tested whether the camera could capture images correctly. The camera board was used as the input device for collecting facial expression data and later running the trained model.
- Connecting the board by USB-C
- Checking that the board was recognized by the computer
- Testing basic camera capture
- Confirming the image quality and lighting conditions
Step 3: Collect the Dataset
I collected images for each expression class.
| Class | Target Number of Images | Notes |
|---|---|---|
| Happy | 30-50 | Smiling face |
| Neutral | 30-50 | Normal face |
| Sad / Surprised | 30-50 | Stronger facial expression |
To improve the dataset, I tried to vary:
- Lighting
- Face angle
- Distance from the camera
- Background
- Expression intensity
This helped the model learn more reliable patterns instead of only memorizing one image condition.
Step 4: Train the Model in Edge Impulse
After collecting the dataset, I uploaded the images to Edge Impulse and labeled them by expression. Then I created an image classification model.
- Upload image data
- Label each class
- Create an image classification impulse
- Train the model
- Check the accuracy
- Review the confusion matrix
- Test the model with new images
The confusion matrix helped me see which expressions were easy or difficult for the model to recognize.
Step 5: Deploy the Model to ESP32-S3
After training, I exported the Edge Impulse model as an Arduino library. Then I installed the library in Arduino IDE and uploaded the model inference code to the ESP32-S3 board.
The board captured an image, ran inference, and printed the predicted expression and confidence score in the Serial Monitor.
Prediction: happy Confidence: 0.86 Inference time: ___ ms
Step 6: Connect the NeoPixel Ring
Next, I connected the NeoPixel ring to the ESP32-S3.
| NeoPixel Pin | Connects To |
|---|---|
| 5V | 5V |
| GND | GND |
| DIN | ESP32-S3 GPIO pin |
The ESP32-S3 and NeoPixel must share a common ground.
Before connecting the machine learning result, I tested simple LED colors first:
Red Green Blue White Rainbow
Step 7: Map Expressions to LED Responses
After confirming that the NeoPixel worked, I connected the model result to LED behavior.
If expression is Happy: show warm yellow or rainbow animation If expression is Neutral: show soft white or calm blue If expression is Sad or Surprised: show blue or purple breathing light
This made the system interactive because the physical LED output changed according to the user's facial expression.
Step 8: Test the Interaction
I tested the system by showing different facial expressions in front of the camera.
- Expression shown
- Predicted label
- Confidence score
- LED response
- Whether the result was correct
| Test | Real Expression | Predicted Expression | Confidence | LED Response | Result |
|---|---|---|---|---|---|
| 1 | Happy | Happy | ___ | Yellow | Correct |
| 2 | Neutral | Neutral | ___ | White | Correct |
| 3 | Sad | Neutral | ___ | White | Needs improvement |
Problems and Solutions
| Problem | Solution |
|---|---|
| The model confused neutral and sad expressions | I added more training images with clearer expressions |
| Camera image was too dark | I improved the lighting |
| LED color did not change correctly | I checked the GPIO pin and NeoPixel wiring |
| The model was slow | I reduced the number of classes and simplified the model |
| Prediction confidence was low | I collected more balanced data for each class |
Final Result
The final system can use an ESP32-S3 camera to classify simple facial expressions and change the NeoPixel LED color based on the result. This demonstrates how machine vision can be embedded into a physical interactive object.
For my final project, this feature can become part of Jelamp's emotional response system. Jelamp can use visual input to understand the user's expression and respond through light, movement, or both.
Minimum Viable Version
If time is limited, the project can still meet the assignment requirement with two expression classes:
Happy Not Happy
The output behavior can be:
Happy -> yellow NeoPixel Not Happy -> blue NeoPixel
This still demonstrates:
- Image data collection
- Machine learning model training
- Embedded deployment
- Physical LED output
- Reproducible documentation
Files to Include
To make the project reproducible, I should include:
- Edge Impulse project screenshots
- Dataset examples
- Model training result
- Confusion matrix screenshot
- Arduino code
- Wiring diagram
- Photos of the setup
- Short demo video
- Final testing results
Reflection
This week helped me understand how machine learning can be used as part of a physical interactive product. Instead of only controlling LEDs manually, I trained a model that allows the lamp to respond to visual input.
The most important lesson was that the quality of the dataset strongly affects the quality of the result. Good lighting, clear labels, and enough examples are important for a reliable model.
This experiment is useful for Jelamp because it adds a new layer of interaction. In the future, I can combine facial expression detection with servo movement so the lamp can respond with both motion and light.