Smart Glasses Kit – System Integration Documentation

Overview

The Smart Glasses Kit is a modular, attachable device that can be mounted onto conventional eyeglasses to bring them smart capabilities. This kit supports voice interaction and visual understanding, enabling use cases such as:

“What’s the weather today?” – The glasses respond via voice.
“What’s in front of me?” – The glasses analyze and report what they see.

This document outlines the full system integration, including hardware, communication structure, software pipeline, and the bill of materials (BOM).

System Architecture

Hardware Components

1. Server

A local PC acts as the processing server, running TTS and STT models.
A LLM model (e.g., OpenAI Whisper, OpenAI GPT) is used for conversational AI and image analysis.

2. Glasses Kit Modules

Component	Unit Price (RMB)	Unit Price (USD, approx)
XIAO ESP32S3 with Sense (cam + mic)	—	$14.99
SmartGlassesKit PCB (5 for ¥20)	¥4	~$0.56
LED (5 for ¥5)	¥1	~$0.14
Button (5 for ¥5)	¥1	~$0.14
GC9107 0.85" LCD	¥29	~$4.04
MAX98357 Audio Amplifier	¥10	~$1.39
Small Speaker	¥5	~$0.70
Beam Splitter Prism	¥20	~$2.78
Lion Battery	¥10	~$1.39
Custom TPU 3D-Printed Case	—	~$1.50 (estimated)

Total Estimated BOM Cost: ~$27.24

[Insert Illustration Here – "Exploded View of Smart Glasses Kit"]

(Use the exploded-view diagram with labeled components: case, prism, speaker, MAX98357, LCD, ESP32S3, and PCB)

Hardware Integration

The components are stacked in a compact form:

Core Controller: XIAO ESP32S3 receives audio and video input, and transmits data to server.
Display: GC9107 shows simple visual results.
Audio Output: MAX98357 + speaker play voice responses.
Interaction: Button toggles modes or confirms actions.
Prism: Reflects display output into user's field of view.
3D-Printed Case: Provides enclosure and mounting.

Software Workflow

Use Cases

Weather Inquiry
- User speaks: "What’s the weather?"
- ESP32S3 records → sends to server → server fetches weather → responds via TTS
Visual Scene Detection
- User says: "What’s in front of me?"
- ESP32S3 captures photo → server runs CV model → responds with TTS + result image

Communication Protocol

Wi-Fi MQTT or WebSocket connection to the server
STT and CV processed on server side
Results sent back as structured JSON:

{
  "imgage": "base64",
  "text": "There are three people ahead of you."
}

Assembly Guide

Solder components on the SmartGlassesKit PCB.
Mount XIAO ESP32S3, LCD, and speaker.

Connect Sense to XIAO ESP32S3
COnnect Speaker to MAX98357
Connect LCD to SmartGlassesKit PCB(designed by me)
Connect MAX98357 and XIAO ESP32S3 to SmartGlassesKit PCB.
Hero shot of the assembled kit.

Connect beam splitter prism in line with LCD.
Enclose everything in 3D-printed TPU case.
Attach kit to glasses frame with clips. [TDO]

Overview​

System Architecture​

Hardware Components​

1. Server​

2. Glasses Kit Modules​

[Insert Illustration Here – "Exploded View of Smart Glasses Kit"]​

Hardware Integration​

Software Workflow​

Use Cases​

Communication Protocol​

Assembly Guide​

Resources​