Skip to content

Final project

Desktop AI companion device with voice interaction — working name Lucky Bot.

Concept images (Gemini)


Timeline

Final project - Lucky Bot

Date Work Remark
9th,May Finalize the idea Material for final, get the dimension of display Material: Xiao, Display, mic
10th, May Finish the 3D design and 3D printing How to add the 2D design?
11th-12th,May Build the prototype
13th, 14th,May
15th,May Receive the PCB and solder the component, make the prototype

9th, May

Finalize the idea,

I wouldn’t change the ID of AI bot, I will try to make the 3D first

Components:

Input: microphone,INMP441

Output: display,GC9A01 1.28 inch, LCD,SPI display

MCU: Xiao ESP32-C3

PCB made before

Firmware: LLM

The material of the final

PCB already got produced by JLC.

Mic, INMP441 purchased from Taobao

Display: GC9A01 1.28 inch, recommend by Gemini, purchased from Taobao

10th, May

Make the 3D enclosure

Step 1 draw a rectangle (100 x 100 mm) with Fillet (20mm)

Step 2 extrude with height 100 mm

Step 3 use “Draft” to optimize the Cube

Step 4 Fillet to optimize the Top and bottom plane

Step 5 make a basic plane to place the cover, depth is 1.5 mm, I will set the 1.5 mm as the thickness of the enclosure when “Shell” later

Create a smaller rectangle in the new plane for extrude

Step7 extrude the small rectangle with 0.1mm, to make a plane for Shell, as shell only proceed with plane.

Step6 Shell with the new plane, with thickness 1.5mm

Step 7 make the plate for the LCD display.

7.1 Add Part studio 2, and use Derived to Copy the plane to make the plane directly.

Use the Use/convert to make quoted rectangle to be revisable, as below, the line become black

The tolerance of 3D printing is 0.1 mm, the outline of the enclosure is 91.27mm

Then adjust the cover to be 91.16 mm

7.2 refer the size of the LCD to design the hole for the LCD

7.3 to place the display better, I need draw hole and Step layer, and then extrude

7.3 extrude and draw the hole on the cover plate and 2 sides, use Extrude to make the hole. Hole is for microphone.

7.5 I finally got the 2 parts

7.6 assembly

3D Printing

Printing (video as below)

Update: for the Cover plate , I add 2 hook to fix the LCD PCB.

On 11th may, I Got 2 sets

I see 2 issues need to be improved in my next version:

Error is not right, the cover not match the enclosure, the error should be 0.2mm or 0.3mm, I could try 0.3mm first for testing

“Noodles” on the surface, I need to adjust the slope to be more sharp,

Or i can print the enclosure in another direction like picture below.

Update on 13th, May, I received the Mic and display as below, and then test Dimension testing with all parts

Display and microphone

PCB can be placed inside of the enclosure.

Try to match the display and display.

White cover plate + black display looks nice.

The diameter of the circle is some smaller, I need to adjust the diameter to be 35.8mm with 0.02mm margin

The limit stopper is too lower, the display cannot be inserted into the the part of circle. The height of stopper i will add 1mm first.

One more thing, I just realize I didn’t leave an hole for the Type C cable. I will revise all parts in the second version.

0516 make prototype of the electronics part first.

Learn basic information of the display and pin of the microphone and display.

Pin of Display

Pin of the mic

Translation to English:

SCK: Serial data clock for the I²S interface.

WS: Serial data word select for the I²S interface.

L/R: Left/Right channel selection.

When set to Low level (GND), the microphone outputs signals on the left channel of the I²S frame.

When set to High level (VCC), the microphone outputs signals on the right channel of the I²S frame.

SD: Serial data output for the I²S interface.

VCC: Power input, 1.8V to 3.3V

GND: Power ground.

Wiring

ESP32-C3 I/O

Microphone (INMP441) ➔ Seeed XIAO ESP32-C3 Wiring

Standard I2S digital audio bus connection:

Microphone Pin (INMP441) Seeed XIAO ESP32-C3 Physical Pin Corresponding Pin in Code (GPIO) Bus Function Description
VCC 3.3V 3.3V Digital power supply for the microphone.
GND GND GND Power ground.
L/R GND GND Grounded (Low): Outputs Left Channel for single-channel (Mono) audio.
SCK D2 GPIO 4 I2S Serial Bit Clock line (BCLK).
WS D3 GPIO 5 I2S Word Select / Frame Clock line (LRCK).
SD D1 GPIO 3 I2S Serial Data Out line (audio data input to the MCU).

2.Round Screen (GMT128-02 / GC9A01) ➔ Seeed XIAO ESP32-C3 Wiring

4-line SPI serial bus connection:

Screen Pin (GMT128-02) Seeed XIAO ESP32-C3 Physical Pin Corresponding Pin in Code (GPIO) Adjustment & Advantage Description
1. VCC 5V 5V Connects to the stable 5V rail to ensure enough power for the backlight.
2. GND GND GND Power ground (must share a common ground with the microphone).
3. SCL D8 GPIO 8 Hardware Fixed: SPI Serial Clock line (SCK).
4. SDA D10 GPIO 10 Hardware Fixed: SPI Serial Data Out line (MOSI).
5. DC D4 GPIO 6 Data/Command selection pin.
6. CS D0 GPIO 2 Strapping Pin Warning: Please check the crucial boot note below.
7. RST D5 GPIO 7 Hardware reset pin (active low).

With a breadboard and cable

Debug the prototype

Step 1 connect the prototype with Laptop

Step 2 use cursor to debug

ask Cursor to check the connection

Tell Cursor the demand: Xiao ESP32-C3 + display(GC9A01)+ mic (INMP441)

C。Cursor check with on the Pin connection, I send the setting above by picture.

Debug on the connection, setting as below from Cursor.

Upon powering up, you should observe the following:

Red → Green → Blue → White (approx. 0.6 seconds each)

Black background with a cyan horizontal bar in the center

A green volume bar at the bottom (elongates when speaking into the microphone)

0517 Update enclosure,

I wanna revise the design more close to the effect picture.

I create a Cube with chamfer, steps smilar with before, and use Plane - Line angle with setting below,

Then remove the part 2, much more closer

Fillet with setting below , looks good

Revise the dimension, finalize as below

Extrude, depth 2.5 mm

The update 3D is as below

Add the blocks to inserts the PCBA

0525 update

The updated enclosure with 3D printing

Some parts to be improved :

Gap between top cover and the enclosure, I need to adjust the fillet of the cover plate, radius = 28.5 -0.2(error) = 28.3 mm

The heights of blocks need to be increased, +10mm;

The distance between the blocks need be narrowed with 0.2mm, to be 1.6mm;

The the height of the stopper need to add 0.2 mm, to let the display easy to insert

The edge is too sharp

Revise as below

The updated 3D file.

0527 update - Software demand

Hardware platform:

Input: microphone,INMP441

Output: display,GC9A01 1.28 inch, LCD,SPI display

MCU: Xiao ESP32-C3

Voice to Text

Mic collect the voice message and change to text on cloud platform

Waking up word - Hi Lucky

Cloud platform

Wifi connection

Send Voice massege to platform and turn it to text

Text to LLM

Cloud platform connect with LLM and return the result to the platform

Cloud platform send out the text on display

Display;

When I talking with the AI bot, it will show a emoji stands for listening

when processing messages on cloud platform, a emoji stands for working

show the text result from cloud platform

when silent from 10s, it will go to Sleep mode, a emoji stands for sleeping.

I debug with gemini, finally finalized the software development document.

Update on 28th, May

Coding/ software, there is 4 steps.

voice to text ASR(Automated Speech Recognition)

Connect the device to Wifi

Provide the wifi account and code, to generate the firmware,and burn the firmware

Testing :when the display show green means connected with wifi

Green: connected

Red: disconnected

Yellow: connecting

How to change the wifi configuration

Step 1: revise the information of Wifi in include/secrets.h

Step 2: burn the firmware again.

Mic collect the audio and change it to text, with FunASR on cloud, and save the data to DB

Steps: 2.1 find out the IP address of the Mac.

2.2 install ASR on MAC and start the service

2.3 use ESP32-C3 to record voice and recognize

Testing

run the command in terminal

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
python3 -m platformio device monitor

When the display show green means wifi connection is good.

In terminal, type r

Start testing

Light blue [REC] Recording 3s...
yellow [ASR] POST http://192.168.3.153:8765/asr...
green [ASR] Text: the content I said

There is some error,iit didn’t turn out the content I said, so send the the image above to Cursor to debug。

It has been recovered but a new problem came, as below, “Malloc failed”

I send it to Cursor to debug again, I learned”Malloc Failed” means, there is no more storage. Cursor send me a new firmware and run it again.

It works again. Attach with video “ASR Testing”

Connect AI for interactive communication, save the result to DB; I planned to use OpenAI API, Cursor suggest ollama, an open source tool to running LLM, more suitable for development, if Ollama didn’t run ideally, I can try OpenAI API. I follow the steps below,

3.1 install Ollama

3.2 Run Ollama and download Qwen2.5:3B LLM

3.3 burn the new firmware

3.4 check result, there will be 2 roles in the Terminal

Ollama and Qwen 2.5:3B model has been installed

Run testing, to make sure the step works well,follow the steps below

Close serial monitor in Terminal, ctrl + c

Burn new firmware:

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
python3 -m platformio run -t upload

Make sure Ollama and Lucky Bot is working, Ollama is already opening, and I used ,http://127.0.0.1:8765/health to check the status on browser

Open serial monitor with command below,

python3 -m platformio device monitor

Talk with the Lucky Bot, check if it works, but It didn’t run well, show error

I debug with cursor on the issue, all connection is running good, but there is router AP isolation.

I will going to change the network to my iphone hotspot, same as above, mac and esp32 connect with my iphone, configure the wifi to ESP32, finally solve the problem, it start running, output the audio from my Mac, video as below , video file - AI connection+ voice bot testing

Update on 30th, May

Wake-up words, I will use “ Hey Lucky”

Prompt: I wanna setting the keyword” Hey Lucky”

Cursor: provide a solution:

Mic continuous monitoring + VAD

Whisper recognition, match “ Hey Lucky”

After waking up, it goes to voice chat (recording - AI conversation -TTS)

I got a firmware from cursor with command below, I will flash to ESP32 in terminal

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
python3 -m platformio run -t upload
python3 -m platformio device monitor

When I say “ Hey Lucky” it has been detected, but there is new problem as below,

I sent the picture to debug and got the reason and new firmware

Reason: After detecting the wake word, recording starts immediately for 3 seconds, but by then you've usually already finished saying "Hey Lucky," so most of what's recorded is silence. Whisper can't recognize any content, and the server returns a 400 error

New firmware:

/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

And restart the mac server:

"/Users/jerryrong/Fablab/Final project/Final- coding/server/start.sh"

It runs good, see the video attached ( Wake-up word, Hey Lucky)

But I feel it is not sensitive enough, sometimes I need to call him 2 or 3 times, and the transition time, from waking up to listening, I has cursor to improve

Cursor find out the reason and provide the new firmware:

/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

With new setting

I find a new issue, the bot should be continue to talk after the first wake up, no need to wake up every time when we are in communication.

New firmware:

/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

There is a new issue happen, I need a continuous conversation, but the mic will collect the reply of the AI bot, so it will reply again, no any time for you to talk with the bot, bot never stop talking. Cursor fix the problem with a new Firmware and update the server

firmware

/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

Update server:

"/Users/jerryrong/Fablab/Final project/Final- coding/server/start.sh"

I got a good interaction experience, see the video attached (bot continuous communication)

I feel I can add 2 more waking up words , Hi Lucky and Lucky

I found there is too many details need to improve, spent several hours but no obvious Progress, I found a important issue, as no experience, I found the problem is by random, as I didn’t make a request list to Ai first. Therefore, In future, I’d better list the request first.

Web/Platform design

Basic functions:

1) Need a switch to enable the Mac server at once (Cursor suggests if this is necessary)

2) Need a status bar to display the Wi-Fi connection status of the ESP32 and the Wi-Fi connection

3) Need a box to display the text content as mentioned, and another box to display

4) Need a status bar, speaking (display when a person is speaking), thinking (display when the AI bot is thinking)

5) Overall page style, Apple style

Gemini give me the diagram

The first version

Open Mac server again

lsof -tiTCP:8765 -sTCP:LISTEN | xargs kill
"/Users/jerryrong/Fablab/Final project/Final- coding/server/start.sh"

Flash New firmware

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

Web to check

Launching website: http://127.0.0.1:8764/

Web: http://127.0.0.1:8765/

Video show how does it work

Optimization:

I need button to turn on the Mac server

change the Icon of the Robot

Display UI

My Idea:

Use my own image

The text will be displayed in the screen with text, but the style is Chinese cursive script

For Image, I will use three image to cover all the 9 states

The current setting with some image generated by Code, as below

I sent the image to Cursor, and require the status and the images, different status with different image,

I got the initial firmware and burn it to ESP32

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

UI has been updated, and Testing as the video( file “UI”)

I didn’t some other optimization on the UI, to make sure the status is more close the daily use. Like

Status to show wifi disconnection, to maintain for a long time with a UI, until connected again.

Update on 31st, May

Add voice interaction.

My Idea:

1) After powering on, once the ESP successfully connects, play an audio file located at @/Users/jerryrong/Fablab/Final project/音频/开机.m4a. The display image should remain as the first one.

2) Each time the device is awakened, play another audio file @/Users/jerryrong/Fablab/Final project/音频/干嘛.m4a to create interaction, letting users know the bot has received their input. Use the second image (previously mentioned second image) during this playback.

3) During standby mode—when there is no conversation between the user and the AI bot—the AI bot should randomly play a third audio file @/Users/jerryrong/Fablab/Final project/音频/笑声.m4a without a fixed interval. Note: this audio plays only when in standby; it should not play during active conversations. The display image remains unchanged, using the first image as before.

Finally, whether to compress the files depends on the hardware specifications.

I sent the same prompt to Cursor but in Chinese,

Got the initial firmware

Updated Mac server

"/Users/jerryrong/Fablab/Final project/Final- coding/server/launcher.sh"

New firmware

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

It runs well, as show in the video (+voice interaction/file)

Text to the Display

My idea, Content displayed on screen:

1) Text appears in cursive script style.

2) The speed of display matches the AI bot's response speed—whenever the AI says a character, that character appears on screen, with varying sizes creating a dynamic, pulsating effect.

I use the same above as prompt (in Chinese) to Cursor, and I got a new firmware and to restart the server

Restart the server

"/Users/jerryrong/Fablab/Final project/Final- coding/server/launcher.sh"

New firmware

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
pio run -t upload

For this function, debugged in many hours, still not succeed, the Main reason is the Ram is too too small in Xiao ESP32-C3, and there is many tasks is running. Reasons analyzed by Cursor as below,

TTS

Optimize the Voice, to make it more close to a human, not a Machine I will clone the voice and apply to the TTS

There is 2 way to clone a voice, at edge or cloud, here is the comparison, F5-TTS(local) VS Cloud (ElevenLabs / Fish, etc.)

I will prefer to use the cloud for a better quality, I will run the F5 TTS first to check the voice quality first, as It is free, and already connect with my web platform

install F5-TTS, Cursor help me installed directly

Raw data of audio and Text Attach the voice sample and Text of the content

Send out the Text and Audio to Cursor, Cursor will configure automatically. With around 30 minutes, it turn out a voice, as attached, with file,clone_preview from F5 TTS

I found the speed is too fast, then I request the speed to 0.85, and keep the same quality.

Ask Cursor to configure the voice the server

But I found a big issue, F5-TTS is a local model running in my laptop, when AI bot generate the content, it need to send it to F5-TTS first, and it will takes around 5 minutes to return the audio with the designated voice.

Therefore, I need to go back to cloud again, with search, I found I will consider to use the API from iFLYTEK

Register on Iflytek you will get a month free use

Send Cursor the documentation from Iflytek about how to train a voice model

Meanwhile send the cursor the APPID / APIKey / APISecret

After the training, I got the Voice and Voice ID,XFYUN_RES_ID=*c_ttsclone-ec73516e-v****

The audio attached. ( voice from iFlytek), Testing again, see the video

Optimize on the voice interaction.

Timing

Gap between human voice and Ai

Gap between Ai and human voice

Sensitivity of the wake up word

Current status: hard to wake up Lucky Bot, I need call him 4-5 times more, the reason is the whisper I used, I will use another whisper model for the key words wake up. The reason analyze from Cursor

I downloaded the new Whisper Tiny model and deployed it on the Mac. There is a big improvement — it works normally with just 1–2 calls when the sound is clear.

Quality of the content, better to use a cloud API

31st, May — the best firmware version so far:

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

How to recovery the current firmware version. Just in case :

Approach 1. burn the firmware below,

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
./tools/restore_firmware.sh A6n

Approach 2, burn from the source code

unzip -o firmware_backups/A6n-source-20260531.zip -d .
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

Change to OpenAI API instead of the Ollama running Qwen

Assemble from breadboard to PCB

I used the PCB made in Assignment 6, and assemble them together with the same wiring. As below

Burn the firmware to another Xiao ESP C3

Connect with Cursor

Check/find the port

Burn the firmware

cd "/Users/jerryrong/Fablab/Final project/Final- coding"
./tools/restore_firmware.sh A6n /dev/cu.usbmodemXXX

XXX means the port in the new board

Assemble the Lucky Bot

The 3 major parts as below,

Add the PCB, display and mic to the related position with some structure to fix them

insert the top cover to the enclosure part,

Connect with USB cable and test again. I feel it is good — the best condition so far, as the video below:

2D

My idea is to make a small stand for the Lucky Bot, Lucky Bot is desktop device, so I will place the bot on the stand, and meanwhile the bot could be regard as a small storage box on my desk.

I draw the sketch, as below,

Use onshape to make the design file, . Onshape link for reference

A testing file to confirm the kerf, When the Male part is 20mm, the female part will be 19.6 mm, the match is the best.

I export the file with DXF, and use laser maker to combine them together,

Combined dxf file as attached.

I use the Laser cutter to make the stand/ box,and assemble them.

The finished final project

Presentation:

1)Slide:

2)Video: