Final project
Desktop AI companion device with voice interaction — working name Lucky Bot.
Concept images (Gemini)

Timeline
Final project - Lucky Bot
| Date | Work | Remark |
|---|---|---|
| 9th,May | Finalize the idea Material for final, get the dimension of display | Material: Xiao, Display, mic |
| 10th, May | Finish the 3D design and 3D printing | How to add the 2D design? |
| 11th-12th,May | Build the prototype | |
| 13th, 14th,May | ||
| 15th,May | Receive the PCB and solder the component, make the prototype |
9th, May
Finalize the idea,
I wouldn’t change the ID of AI bot, I will try to make the 3D first
Components:
Input: microphone,INMP441
Output: display,GC9A01 1.28 inch, LCD,SPI display
MCU: Xiao ESP32-C3
PCB made before
Firmware: LLM
The material of the final
PCB already got produced by JLC.

Mic, INMP441 purchased from Taobao

Display: GC9A01 1.28 inch, recommend by Gemini, purchased from Taobao

10th, May
Make the 3D enclosure
Step 1 draw a rectangle (100 x 100 mm) with Fillet (20mm)

Step 2 extrude with height 100 mm

Step 3 use “Draft” to optimize the Cube

Step 4 Fillet to optimize the Top and bottom plane

Step 5 make a basic plane to place the cover, depth is 1.5 mm, I will set the 1.5 mm as the thickness of the enclosure when “Shell” later

Create a smaller rectangle in the new plane for extrude

Step7 extrude the small rectangle with 0.1mm, to make a plane for Shell, as shell only proceed with plane.

Step6 Shell with the new plane, with thickness 1.5mm

Step 7 make the plate for the LCD display.
7.1 Add Part studio 2, and use Derived to Copy the plane to make the plane directly.

Use the Use/convert to make quoted rectangle to be revisable, as below, the line become black

The tolerance of 3D printing is 0.1 mm, the outline of the enclosure is 91.27mm

Then adjust the cover to be 91.16 mm

7.2 refer the size of the LCD to design the hole for the LCD


7.3 to place the display better, I need draw hole and Step layer, and then extrude


7.3 extrude and draw the hole on the cover plate and 2 sides, use Extrude to make the hole. Hole is for microphone.


7.5 I finally got the 2 parts


7.6 assembly




3D Printing


Printing (video as below)
Update: for the Cover plate , I add 2 hook to fix the LCD PCB.

On 11th may, I Got 2 sets

I see 2 issues need to be improved in my next version:
Error is not right, the cover not match the enclosure, the error should be 0.2mm or 0.3mm, I could try 0.3mm first for testing
“Noodles” on the surface, I need to adjust the slope to be more sharp,


Or i can print the enclosure in another direction like picture below.

Update on 13th, May, I received the Mic and display as below, and then test Dimension testing with all parts
Display and microphone


PCB can be placed inside of the enclosure.

Try to match the display and display.
White cover plate + black display looks nice.

The diameter of the circle is some smaller, I need to adjust the diameter to be 35.8mm with 0.02mm margin

The limit stopper is too lower, the display cannot be inserted into the the part of circle. The height of stopper i will add 1mm first.

One more thing, I just realize I didn’t leave an hole for the Type C cable. I will revise all parts in the second version.
0516 make prototype of the electronics part first.
Learn basic information of the display and pin of the microphone and display.
Pin of Display

Pin of the mic

Translation to English:
SCK: Serial data clock for the I²S interface.
WS: Serial data word select for the I²S interface.
L/R: Left/Right channel selection.
When set to Low level (GND), the microphone outputs signals on the left channel of the I²S frame.
When set to High level (VCC), the microphone outputs signals on the right channel of the I²S frame.
SD: Serial data output for the I²S interface.
VCC: Power input, 1.8V to 3.3V
GND: Power ground.
Wiring
ESP32-C3 I/O

Microphone (INMP441) ➔ Seeed XIAO ESP32-C3 Wiring
Standard I2S digital audio bus connection:
| Microphone Pin (INMP441) | Seeed XIAO ESP32-C3 Physical Pin | Corresponding Pin in Code (GPIO) | Bus Function Description |
|---|---|---|---|
| VCC | 3.3V | 3.3V | Digital power supply for the microphone. |
| GND | GND | GND | Power ground. |
| L/R | GND | GND | Grounded (Low): Outputs Left Channel for single-channel (Mono) audio. |
| SCK | D2 | GPIO 4 | I2S Serial Bit Clock line (BCLK). |
| WS | D3 | GPIO 5 | I2S Word Select / Frame Clock line (LRCK). |
| SD | D1 | GPIO 3 | I2S Serial Data Out line (audio data input to the MCU). |
2.Round Screen (GMT128-02 / GC9A01) ➔ Seeed XIAO ESP32-C3 Wiring
4-line SPI serial bus connection:
| Screen Pin (GMT128-02) | Seeed XIAO ESP32-C3 Physical Pin | Corresponding Pin in Code (GPIO) | Adjustment & Advantage Description |
|---|---|---|---|
| 1. VCC | 5V | 5V | Connects to the stable 5V rail to ensure enough power for the backlight. |
| 2. GND | GND | GND | Power ground (must share a common ground with the microphone). |
| 3. SCL | D8 | GPIO 8 | Hardware Fixed: SPI Serial Clock line (SCK). |
| 4. SDA | D10 | GPIO 10 | Hardware Fixed: SPI Serial Data Out line (MOSI). |
| 5. DC | D4 | GPIO 6 | Data/Command selection pin. |
| 6. CS | D0 | GPIO 2 | Strapping Pin Warning: Please check the crucial boot note below. |
| 7. RST | D5 | GPIO 7 | Hardware reset pin (active low). |
With a breadboard and cable



Debug the prototype
Step 1 connect the prototype with Laptop

Step 2 use cursor to debug
ask Cursor to check the connection
Tell Cursor the demand: Xiao ESP32-C3 + display(GC9A01)+ mic (INMP441)

C。Cursor check with on the Pin connection, I send the setting above by picture.
Debug on the connection, setting as below from Cursor.
Upon powering up, you should observe the following:
Red → Green → Blue → White (approx. 0.6 seconds each)
Black background with a cyan horizontal bar in the center
A green volume bar at the bottom (elongates when speaking into the microphone)
0517 Update enclosure,
I wanna revise the design more close to the effect picture.

I create a Cube with chamfer, steps smilar with before, and use Plane - Line angle with setting below,

Then remove the part 2, much more closer

Fillet with setting below , looks good

Revise the dimension, finalize as below


Extrude, depth 2.5 mm

The update 3D is as below

Add the blocks to inserts the PCBA

0525 update
The updated enclosure with 3D printing


Some parts to be improved :
Gap between top cover and the enclosure, I need to adjust the fillet of the cover plate, radius = 28.5 -0.2(error) = 28.3 mm

The heights of blocks need to be increased, +10mm;
The distance between the blocks need be narrowed with 0.2mm, to be 1.6mm;

The the height of the stopper need to add 0.2 mm, to let the display easy to insert

The edge is too sharp

Revise as below


The updated 3D file.
0527 update - Software demand
Hardware platform:
Input: microphone,INMP441
Output: display,GC9A01 1.28 inch, LCD,SPI display
MCU: Xiao ESP32-C3
Voice to Text
Mic collect the voice message and change to text on cloud platform
Waking up word - Hi Lucky
Cloud platform
Wifi connection
Send Voice massege to platform and turn it to text
Text to LLM
Cloud platform connect with LLM and return the result to the platform
Cloud platform send out the text on display
Display;
When I talking with the AI bot, it will show a emoji stands for listening
when processing messages on cloud platform, a emoji stands for working
show the text result from cloud platform
when silent from 10s, it will go to Sleep mode, a emoji stands for sleeping.
I debug with gemini, finally finalized the software development document.
Update on 28th, May
Coding/ software, there is 4 steps.
voice to text ASR(Automated Speech Recognition)
Connect the device to Wifi



Provide the wifi account and code, to generate the firmware,and burn the firmware

Testing :when the display show green means connected with wifi
Green: connected
Red: disconnected
Yellow: connecting

How to change the wifi configuration
Step 1: revise the information of Wifi in include/secrets.h
Step 2: burn the firmware again.


Mic collect the audio and change it to text, with FunASR on cloud, and save the data to DB
Steps: 2.1 find out the IP address of the Mac.
2.2 install ASR on MAC and start the service
2.3 use ESP32-C3 to record voice and recognize

Testing
run the command in terminal
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
python3 -m platformio device monitor
When the display show green means wifi connection is good.
In terminal, type r
Start testing
| Light blue | [REC] Recording 3s... |
|---|---|
| yellow | [ASR] POST http://192.168.3.153:8765/asr... |
| green | [ASR] Text: the content I said |


There is some error,iit didn’t turn out the content I said, so send the the image above to Cursor to debug。
It has been recovered but a new problem came, as below, “Malloc failed”

I send it to Cursor to debug again, I learned”Malloc Failed” means, there is no more storage. Cursor send me a new firmware and run it again.

It works again. Attach with video “ASR Testing”

Connect AI for interactive communication, save the result to DB; I planned to use OpenAI API, Cursor suggest ollama, an open source tool to running LLM, more suitable for development, if Ollama didn’t run ideally, I can try OpenAI API. I follow the steps below,
3.1 install Ollama
3.2 Run Ollama and download Qwen2.5:3B LLM
3.3 burn the new firmware
3.4 check result, there will be 2 roles in the Terminal

Ollama and Qwen 2.5:3B model has been installed

Run testing, to make sure the step works well,follow the steps below

Close serial monitor in Terminal, ctrl + c
Burn new firmware:
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
python3 -m platformio run -t upload


Make sure Ollama and Lucky Bot is working, Ollama is already opening, and I used ,http://127.0.0.1:8765/health to check the status on browser

Open serial monitor with command below,
python3 -m platformio device monitor
Talk with the Lucky Bot, check if it works, but It didn’t run well, show error


I debug with cursor on the issue, all connection is running good, but there is router AP isolation.

I will going to change the network to my iphone hotspot, same as above, mac and esp32 connect with my iphone, configure the wifi to ESP32, finally solve the problem, it start running, output the audio from my Mac, video as below , video file - AI connection+ voice bot testing
Update on 30th, May
Wake-up words, I will use “ Hey Lucky”
Prompt: I wanna setting the keyword” Hey Lucky”
Cursor: provide a solution:
Mic continuous monitoring + VAD
Whisper recognition, match “ Hey Lucky”
After waking up, it goes to voice chat (recording - AI conversation -TTS)

I got a firmware from cursor with command below, I will flash to ESP32 in terminal
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
python3 -m platformio run -t upload
python3 -m platformio device monitor
When I say “ Hey Lucky” it has been detected, but there is new problem as below,

I sent the picture to debug and got the reason and new firmware
Reason: After detecting the wake word, recording starts immediately for 3 seconds, but by then you've usually already finished saying "Hey Lucky," so most of what's recorded is silence. Whisper can't recognize any content, and the server returns a 400 error
New firmware:
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
And restart the mac server:
"/Users/jerryrong/Fablab/Final project/Final- coding/server/start.sh"
It runs good, see the video attached ( Wake-up word, Hey Lucky)
But I feel it is not sensitive enough, sometimes I need to call him 2 or 3 times, and the transition time, from waking up to listening, I has cursor to improve

Cursor find out the reason and provide the new firmware:
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
With new setting

I find a new issue, the bot should be continue to talk after the first wake up, no need to wake up every time when we are in communication.
New firmware:
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

There is a new issue happen, I need a continuous conversation, but the mic will collect the reply of the AI bot, so it will reply again, no any time for you to talk with the bot, bot never stop talking. Cursor fix the problem with a new Firmware and update the server
firmware
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
Update server:
"/Users/jerryrong/Fablab/Final project/Final- coding/server/start.sh"
I got a good interaction experience, see the video attached (bot continuous communication)
I feel I can add 2 more waking up words , Hi Lucky and Lucky

I found there is too many details need to improve, spent several hours but no obvious Progress, I found a important issue, as no experience, I found the problem is by random, as I didn’t make a request list to Ai first. Therefore, In future, I’d better list the request first.
Web/Platform design
Basic functions:
1) Need a switch to enable the Mac server at once (Cursor suggests if this is necessary)
2) Need a status bar to display the Wi-Fi connection status of the ESP32 and the Wi-Fi connection
3) Need a box to display the text content as mentioned, and another box to display
4) Need a status bar, speaking (display when a person is speaking), thinking (display when the AI bot is thinking)
5) Overall page style, Apple style
Gemini give me the diagram


The first version

Open Mac server again
lsof -tiTCP:8765 -sTCP:LISTEN | xargs kill
"/Users/jerryrong/Fablab/Final project/Final- coding/server/start.sh"
Flash New firmware
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
Web to check
Launching website: http://127.0.0.1:8764/
Web: http://127.0.0.1:8765/

Video show how does it work
Optimization:
I need button to turn on the Mac server
change the Icon of the Robot

Display UI
My Idea:
Use my own image
The text will be displayed in the screen with text, but the style is Chinese cursive script
For Image, I will use three image to cover all the 9 states
The current setting with some image generated by Code, as below


I sent the image to Cursor, and require the status and the images, different status with different image,

I got the initial firmware and burn it to ESP32
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
UI has been updated, and Testing as the video( file “UI”)

I didn’t some other optimization on the UI, to make sure the status is more close the daily use. Like
Status to show wifi disconnection, to maintain for a long time with a UI, until connected again.
Update on 31st, May
Add voice interaction.
My Idea:
1) After powering on, once the ESP successfully connects, play an audio file located at @/Users/jerryrong/Fablab/Final project/音频/开机.m4a. The display image should remain as the first one.
2) Each time the device is awakened, play another audio file @/Users/jerryrong/Fablab/Final project/音频/干嘛.m4a to create interaction, letting users know the bot has received their input. Use the second image (previously mentioned second image) during this playback.
3) During standby mode—when there is no conversation between the user and the AI bot—the AI bot should randomly play a third audio file @/Users/jerryrong/Fablab/Final project/音频/笑声.m4a without a fixed interval. Note: this audio plays only when in standby; it should not play during active conversations. The display image remains unchanged, using the first image as before.
Finally, whether to compress the files depends on the hardware specifications.
I sent the same prompt to Cursor but in Chinese,

Got the initial firmware
Updated Mac server
"/Users/jerryrong/Fablab/Final project/Final- coding/server/launcher.sh"
New firmware
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload

It runs well, as show in the video (+voice interaction/file)
Text to the Display
My idea, Content displayed on screen:
1) Text appears in cursive script style.
2) The speed of display matches the AI bot's response speed—whenever the AI says a character, that character appears on screen, with varying sizes creating a dynamic, pulsating effect.
I use the same above as prompt (in Chinese) to Cursor, and I got a new firmware and to restart the server
Restart the server
"/Users/jerryrong/Fablab/Final project/Final- coding/server/launcher.sh"
New firmware
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
pio run -t upload

For this function, debugged in many hours, still not succeed, the Main reason is the Ram is too too small in Xiao ESP32-C3, and there is many tasks is running. Reasons analyzed by Cursor as below,



TTS
Optimize the Voice, to make it more close to a human, not a Machine I will clone the voice and apply to the TTS
There is 2 way to clone a voice, at edge or cloud, here is the comparison, F5-TTS(local) VS Cloud (ElevenLabs / Fish, etc.)


I will prefer to use the cloud for a better quality, I will run the F5 TTS first to check the voice quality first, as It is free, and already connect with my web platform
install F5-TTS, Cursor help me installed directly
Raw data of audio and Text Attach the voice sample and Text of the content

Send out the Text and Audio to Cursor, Cursor will configure automatically. With around 30 minutes, it turn out a voice, as attached, with file,clone_preview from F5 TTS
I found the speed is too fast, then I request the speed to 0.85, and keep the same quality.
Ask Cursor to configure the voice the server
But I found a big issue, F5-TTS is a local model running in my laptop, when AI bot generate the content, it need to send it to F5-TTS first, and it will takes around 5 minutes to return the audio with the designated voice.
Therefore, I need to go back to cloud again, with search, I found I will consider to use the API from iFLYTEK
Register on Iflytek you will get a month free use
Send Cursor the documentation from Iflytek about how to train a voice model
Meanwhile send the cursor the APPID / APIKey / APISecret
After the training, I got the Voice and Voice ID,XFYUN_RES_ID=*c_ttsclone-ec73516e-v****

The audio attached. ( voice from iFlytek), Testing again, see the video
Optimize on the voice interaction.
Timing
Gap between human voice and Ai
Gap between Ai and human voice
Sensitivity of the wake up word
Current status: hard to wake up Lucky Bot, I need call him 4-5 times more, the reason is the whisper I used, I will use another whisper model for the key words wake up. The reason analyze from Cursor

I downloaded the new Whisper Tiny model and deployed it on the Mac. There is a big improvement — it works normally with just 1–2 calls when the sound is clear.
Quality of the content, better to use a cloud API
31st, May — the best firmware version so far:
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
How to recovery the current firmware version. Just in case :
Approach 1. burn the firmware below,
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
./tools/restore_firmware.sh A6n
Approach 2, burn from the source code
unzip -o firmware_backups/A6n-source-20260531.zip -d .
/Users/jerryrong/Library/Python/3.9/bin/platformio run -t upload
Change to OpenAI API instead of the Ollama running Qwen
Assemble from breadboard to PCB
I used the PCB made in Assignment 6, and assemble them together with the same wiring. As below


Burn the firmware to another Xiao ESP C3
Connect with Cursor
Check/find the port
Burn the firmware
cd "/Users/jerryrong/Fablab/Final project/Final- coding"
./tools/restore_firmware.sh A6n /dev/cu.usbmodemXXX
XXX means the port in the new board
Assemble the Lucky Bot
The 3 major parts as below,

Add the PCB, display and mic to the related position with some structure to fix them

insert the top cover to the enclosure part,


Connect with USB cable and test again. I feel it is good — the best condition so far, as the video below:
2D
My idea is to make a small stand for the Lucky Bot, Lucky Bot is desktop device, so I will place the bot on the stand, and meanwhile the bot could be regard as a small storage box on my desk.
I draw the sketch, as below,
Use onshape to make the design file, . Onshape link for reference


A testing file to confirm the kerf, When the Male part is 20mm, the female part will be 19.6 mm, the match is the best.

I export the file with DXF, and use laser maker to combine them together,

Combined dxf file as attached.
I use the Laser cutter to make the stand/ box,and assemble them.

The finished final project

Presentation:
1)Slide:

2)Video: