SenseCAP Watcher Adaptation for XIAOZHI AI Project
Project Overview
In this project, we adapt the XIAOZHI AI framework to the SenseCAP Watcher device.
The goal is to create a compact, intelligent agent capable of speech interaction, local AI inference, and online communication via MQTT.
Through this adaptation, the Watcher can not only act as a smart display device but also become a real-time AI assistant.
Hardware Overview
The hardware foundation of this project is the SenseCAP Watcher.
Below are the main specifications:
- MCU: ESP32 series chip
- Display: 1.45" round LCD
- Audio Input: Digital microphone
- Audio Output: Speaker
- Input Device: Multi-functional scroll button
- Communication Interfaces: Wi-Fi / BLE / MQTT
Software Foundation
The software side is based on XIAOZHI AI, a lightweight yet powerful AI framework specifically designed for embedded devices.
It offers the following features:
- Offline speech recognition
- Local inference and command parsing
- Integration with large language models online
- MQTT-based communication capabilities
Start by cloning the source code:
git clone https://github.com/78/xiaozhi-esp32
cd xiaozhi-esp32
Porting XIAOZHI AI to SenseCAP Watcher
This section describes how to configure and build the project for the SenseCAP Watcher hardware.
Build Configuration
First, configure the project settings:
idf.py menuconfig
In the menu:
-
Go to Xiaozhi Assistant → Language Select, and choose English.
-
Go to Xiaozhi Assistant → Board Type, and select SenseCAP Watcher.
Note: Ensure that all settings are correctly saved before proceeding.
Building the Firmware
Now, set the correct target chip and compile the firmware:
idf.py set-target esp32s3
idf.py build
If the build completes successfully, you will see a success message in the terminal.
Flashing the Firmware
After building, flash the firmware to the device:
idf.py flash
Make sure the device is connected via USB and recognized properly by your development environment.
Device Configuration
After flashing, configure the device's AI behavior by setting the character persona and model preferences.
In this example, we configure the device as a bilingual English teacher using Qwen RealTime.
Character configuration example:
I am an English teacher named {{assistant_name}} (Lily). I can speak both Chinese and English with a standard accent.
If you don't have an English name, I will give you one.
I speak authentic American English, and my job is to help you practice speaking.
I will use simple English vocabulary and grammar to make learning easy for you.
I will reply using a mix of Chinese and English. If you prefer, I can also reply entirely in English.
I will keep my responses short and simple each time, because I want to guide my students to speak and practice more.
If you ask questions not related to learning English, I will refuse to answer.
Sending AI Responses via MQTT
To make the AI responses available remotely, we adapt the MQTT communication module.
MQTT Communication Module Adaptation
First, configure the MQTT client parameters:
- Server Address:
broker.emqx.io
- Topic:
fablab/chaihuo/machine/text
- Port:
1883
- Authentication: None (no username/password required)
Example initialization code:
mqtt_cfg.broker.address.uri = "mqtt://broker.emqx.io";
mqtt_cfg.credentials.client_id = "fablab_chaihuo_glasses";
client = esp_mqtt_client_init(&mqtt_cfg);
esp_mqtt_client_register_event(client, static_cast<esp_mqtt_event_id_t>(ESP_EVENT_ANY_ID), mqtt_event_handler, NULL);
esp_mqtt_client_start(client);
Expected connection log:
MQTT Connected Successfully to broker.emqx.io
Structured AI Response Handling
Once the AI generates a reply, format it into a clean JSON message for transmission:
message = std::string(text->valuestring);
Publishing MQTT Messages
Finally, publish the AI response to the configured MQTT topic:
esp_mqtt_client_publish(client, "fablab/chaihuo/machine/text", message.c_str(), 0, 0, 0);
Upon successful publishing, the message will be available to any subscriber of the topic.
Communication Flowchart
The following diagram illustrates the overall communication flow between the device, AI processing, and MQTT broker: