Skip to main content

Week 18 - Applications and Implications

  • assignment
    • Propose a final project masterpiece that integrates the range of units covered,
      • answering:
        • What will it do?
        • Who's done what beforehand?
        • What will you design?
        • What materials and components will be used?
        • Where will come from?
        • How much will they cost?
        • What parts and systems will be made?
        • What processes will be used?
        • What questions need to be answered?
        • How will it be evaluated?
    • Your project should incorporate 2D and 3D design,
      • additive and subtractive fabrication processes,
      • electronics design and production,
      • embedded microcontroller interfacing and programming,
      • system integration and packaging
    • Where possible, you should make rather than buy
      • the parts of your project
    • Projects can be separate or joint, but need to show individual
      • mastery of the skills, and be independently operable

What will it do? This project is a smart glasses kit that can be attached to a regular pair of glasses to provide intelligent features. It supports voice interaction and visual understanding. Example use cases include:

  1. Asking “What’s the weather like today?” and receiving a spoken response.
  2. Asking “What’s in front of me?” and getting a description of the scene.

Who’s done what beforehand? Commercial smart glasses like Google Glass and Meta Ray-Ban provide integrated AR features, but are often expensive and closed-source. This project aims to create a low-cost, open-source alternative focused on voice and vision capabilities, especially tailored for prototyping and educational use.

This project is based on the xiaozhi-esp32 platform. On top of this foundation, I made the following key contributions:

  • Hardware Driver Integration: I developed and integrated drivers for the custom hardware modules I designed.
  • Hardware Adaptation: I adapted the system to ensure compatibility with the newly designed hardware components.
  • Image Recognition Functionality: I added support for image recognition, enabling the device to perform visual analysis.

These additions significantly extended the original functionality of xiaozhi-esp32, making it more suitable for our smart glasses kit application.

What will you design?

  • A custom PCB (SmartGlassesKit) with minimal components (one button and one LED) for control and feedback.
  • A 3D printed enclosure that can be attached to various glasses frames.
  • Integration of microphone, speaker, camera, display, and prism into a wearable form.

What materials and components will be used?

  • Microcontroller: XIAO ESP32S3 with Sense (camera + microphone) — $14.99
  • Custom SmartGlassesKit PCB: ¥ 20 for 5 pieces
  • GC9107 0.85” LCD: ¥ 29
  • MAX98357 audio amplifier: ¥ 10
  • Speaker: ¥ 5
  • LED x5: ¥ 5
  • Button x5: ¥ 5
  • Beam splitter prism: ¥ 20
  • Lion battery: ¥ 10
  • Enclosure: 3D printed using PLA/TPU

Where will they come from?

  • Most electronic components will be sourced from local Chinese electronics markets or online platforms like Taobao.
  • The 3D printed parts will be fabricated in-house using a personal or FabLab 3D printer.

How much will they cost? Approximate total cost (for one unit):

  • XIAO ESP32S3 + Sense: $14.99
  • PCB and components: ¥ 35
  • LCD: ¥ 29
  • Audio and speaker: ¥ 15
  • Prism: ¥ 20
  • Enclosure (3D printed): negligible material cost Total: around $30–35

What parts and systems will be made?

  • A custom PCB (SmartGlassesKit)
  • A 3D printed enclosure
  • Full system integration of voice capture, camera input, audio output, display, and control logic.

What processes will be used?

  • 2D design (GIMP)
  • 3D design (FreeCAD)
  • PCB design (JLC PCB)
  • 3D printing (additive manufacturing)
  • Soldering and electronics assembly
  • Embedded programming (ESP-IDF or Arduino)
  • Application development (Python)
  • System integration and testing

What questions need to be answered?

  • How to optimize latency and performance between the device and server?
  • How to ensure a comfortable and stable fit on various glasses frames?
  • How to manage power and battery life in a compact wearable form?

How will it be evaluated?

  • The device should independently perform speech recognition (STT), query processing, and TTS response.
  • It should capture an image and describe the scene when asked.
  • All components should work in an integrated, wearable form factor.
  • The system should be reproducible and documented.