Smart Glasses Kit – System Integration Documentation
Overview
The Smart Glasses Kit is a modular, attachable device that can be mounted onto conventional eyeglasses to bring them smart capabilities. This kit supports voice interaction and visual understanding, enabling use cases such as:
- “What’s the weather today?” – The glasses respond via voice.
- “What’s in front of me?” – The glasses analyze and report what they see.
This document outlines the full system integration, including hardware, communication structure, software pipeline, and the bill of materials (BOM).
System Architecture
Hardware Components
1. Server
- A local PC acts as the processing server, running TTS and STT models.
- A LLM model (e.g., OpenAI Whisper, OpenAI GPT) is used for conversational AI and image analysis.
2. Glasses Kit Modules
Component | Unit Price (RMB) | Unit Price (USD, approx) |
---|---|---|
XIAO ESP32S3 with Sense (cam + mic) | — | $14.99 |
SmartGlassesKit PCB (5 for ¥20) | ¥4 | ~$0.56 |
LED (5 for ¥5) | ¥1 | ~$0.14 |
Button (5 for ¥5) | ¥1 | ~$0.14 |
GC9107 0.85" LCD | ¥29 | ~$4.04 |
MAX98357 Audio Amplifier | ¥10 | ~$1.39 |
Small Speaker | ¥5 | ~$0.70 |
Beam Splitter Prism | ¥20 | ~$2.78 |
Lion Battery | ¥10 | ~$1.39 |
Custom TPU 3D-Printed Case | — | ~$1.50 (estimated) |
Total Estimated BOM Cost: ~$27.24
[Insert Illustration Here – "Exploded View of Smart Glasses Kit"]
(Use the exploded-view diagram with labeled components: case, prism, speaker, MAX98357, LCD, ESP32S3, and PCB)
Hardware Integration
The components are stacked in a compact form:
- Core Controller: XIAO ESP32S3 receives audio and video input, and transmits data to server.
- Display: GC9107 shows simple visual results.
- Audio Output: MAX98357 + speaker play voice responses.
- Interaction: Button toggles modes or confirms actions.
- Prism: Reflects display output into user's field of view.
- 3D-Printed Case: Provides enclosure and mounting.
Software Workflow
Use Cases
-
Weather Inquiry
- User speaks: "What’s the weather?"
- ESP32S3 records → sends to server → server fetches weather → responds via TTS
-
Visual Scene Detection
- User says: "What’s in front of me?"
- ESP32S3 captures photo → server runs CV model → responds with TTS + result image
Communication Protocol
- Wi-Fi MQTT or WebSocket connection to the server
- STT and CV processed on server side
- Results sent back as structured JSON:
{
"imgage": "base64",
"text": "There are three people ahead of you."
}
Assembly Guide
- Solder components on the SmartGlassesKit PCB.
- Mount XIAO ESP32S3, LCD, and speaker.
- Connect Sense to XIAO ESP32S3
- COnnect Speaker to MAX98357
- Connect LCD to SmartGlassesKit PCB(designed by me)
- Connect MAX98357 and XIAO ESP32S3 to SmartGlassesKit PCB.
- Hero shot of the assembled kit.
- Connect beam splitter prism in line with LCD.[TODO]
- Enclose everything in 3D-printed TPU case. [TODO]
- Attach kit to glasses frame with clips. [TDO]