Skip to content

15. Wildcard week

This week I worked on defining my final project idea and started to getting used to the documentation process.

Individual assignment:
- [x]Design and produce something with a digital process (incorporating computer-aided design and manufacturing) not covered in another assignment, documenting the requirements that your assignment meets, and including everything necessary to reproduce it. Possibilities include (but are not limited to):

What I want to learn?

Machine Learning on Embedded System
This time I want to learn to use a new XIAO ESP32S3 product.
Although I have been working at Seeed for 6 years, I have experienced and used many products, including the earliest Wio Terminal and the latest XIAO series. I have also participated in many offline meetups and met the edge impluse team members, but I am not familiar with them. The mastery of tinyML is not very deep, so I also want to take this opportunity of Fabacademy to learn TinyML applications.
I happen to have three cats at home. I want to do a project to identify cats and dogs by taking photos, and I can take directional photos and record the number of cats’ daily activities.

Why I need to learn?

  1. Learn and Compare the difference of XIAO ESP32C3 & XIAO ESP32S3
  2. Instruction of Embedded System & Machines Learning
  3. Project - TinyML Imange Classification or Object Detection

Learn differences of XIAO ESP32C3 and XIAO ESP32S3

Recall memory about the XIAO series

To this day, I still remember the scene when XIAO first appeared inside Seeed. At that time, I was still working as a product marketing manager. Since I was originally a maker, I often complained to our product manager that Seeed needed to make a microprocessor-controlled development board that was very suitable for developers to use in their projects. The most popular and smallest board at that time was the Arduino Pro mini. There has been no smaller development board for a long time. So when Seeeduino XIAO (because there was no entire XIAO product series at that time, this product first became a member of the Seeeduino series) was developed, every company employee liked it very much, and when we put it on the market, we also received a lot of feedback. With the love of many people, we can now see more and more XIAO models being released. We also hope to make XIAO the favorite MCU platform product of developers and help every developer innovate.

Key differences comparassion of XIAO ESPC3 and XIAO ESP32S3

1.Processor:
- The ESP32C3 has a single-core RISC-V processor, whereas the ESP32S3 has a dual-core Xtensa LX7 processor, which provides more processing power.
2.Clock Speed:
- The ESP32S3 runs at a higher clock speed (240 MHz) compared to the ESP32C3 (160 MHz).
3.Memory:
- The ESP32S3 has significantly more flash memory (8MB vs. 4MB) and includes additional PSRAM (8MB), which is beneficial for more demanding applications.
4.Wireless Capabilities:
- Both support Wi-Fi 4 and Bluetooth 5, but the additional processing power of the ESP32S3 can handle more complex wireless tasks more efficiently.
5.Special Features:
- The ESP32S3 Sense includes an IMU and microphone, making it suitable for applications involving motion tracking and audio processing.
6.Use Cases:
- The ESP32C3 is designed for low-power, simpler applications, while the ESP32S3 Sense is targeted towards AI and IoT applications that require more computing power and built-in sensors.

These differences make the ESP32C3 suitable for simpler, low-power applications, while the ESP32S3 Sense is better suited for more complex projects involving AI, motion sensing, and audio processing.

Instruction of Embedded System & Machines Learning

What is Machine Learning?

Machines Leasring is part of artificial intelligence. Machine learning algorithms build a model based on “training data” in order to make predictions or decisions without being explicitly programmed.

What is TinyML?

TinyML: The Future of Machine Learning is Bright and Tiny.
TinyML is a field of study in Machine Learning and Embedded Systems that explores machine learning on small, low-powered microcontrollers, enabling secure, low-latency, low-power and low-bandwidth machine learning inferencing on edge devices.
- Low Latency:Devices running TinyML models directly on the edge no longer have to send raw data to remote servers. This reduces the latency of data processing and output.
- Low Power Consumption:Microcontrollers consume very little power, enabling them to run on battery power for long periods of time with little to no intervention.
- Low Bandwidth:Data is sent to remote servers at lower quantities and frequencies to allow savings on internet bandwidth and related costs.
- Private & Secure:Since data storage and processing are distributed to the edge, the security risks of centralised cloud data storage can be mitigated.

What is Edge Impulse?

Edge Impulse is the leading machine learning development platform for edge devices, free for developers and trusted by enterprises. Edge Impulse makes it easier than ever to train and deploy machine learning models!
Edge Impulse indeed offers a robust platform for developing and deploying machine learning models on edge devices. It’s particularly popular because it simplifies the process of training models and deploying them to hardware like microcontrollers and IoT devices, making it accessible for both developers and enterprises. The following is the most basic training and deployment process of edgeimpluase.

Project - TinyML Imange Classification vs Object Detection

Hello Kitty! What’s your name?

I need to make a cat face recognition system.

Image Classification or Object Detection?

Because in my final application scenario for identifying different cats, I actually had two options to choose from: image classification and object detection. After comparing some basic parameters and information, I chose object detection to train my model.
1. Image Classification:
- Purpose: Assigns a single label to an entire image.
- Use Case: Useful when you want to identify the main object or scene in an image without needing to know the exact location within the image.
- Example: Identifying whether an image contains a cat, dog, or car.
2. Object Detection:
- Purpose: Identifies and locates multiple objects within an image, providing bounding boxes around each detected object.
- Use Case: Essential when you need to know the presence and position of multiple objects within an image.
- Example: Detecting and locating all the cars and pedestrians in a street scene.
3. When to Use Each:
Use Image Classification if your task only requires knowing the primary subject of the image without concern for its position.
Use Object Detection if you need to identify multiple objects and their locations within the image.
4. Edge Impulse Capabilities:
Edge Impulse supports both image classification and object detection, providing tools to collect data, train models, and deploy them on edge devices.
The platform’s ease of use makes it suitable for various applications, whether you are building a simple classifier or a complex detection system.

Hardware & Software Preparation

  • Seeed XIAO ESP32S3
  • Arduino IDE
  • Edge Impulse Studio
    After preparation, I will follow the workflow of edge impluse and make my project.

Data Collection

Collecting Dataset(Images) with the XIAO ESP32C3

Because the final model deployment requires the camera recognition of XIAO ESP32S3, I need to use this camera to collect some images first. I need photos of three cats from different angles and portraits without cats.
1. Open the Arduino IDE then select XIAO_ESP32C3 board and find the serial port.
2. Open the ESP32 sample code”Examples > ESP32 > Camera, select CameraWebServer”.
3. Due the lastest esp32 library is not stable, i will need the install the old table esp32 libary. Then find the right define mode for XIAO_ESP32S3 and also fix the wifi details.

4. After uploaded the code,I open the web link from serial monitor to capture the image with default setting. There are one mistake here. I tired many times but no wifi coonetion here. Beause i forgot to install antenna for XIAO ESP32S3.

5. I captureed four kinds of imases: one for no cat, there for my three cats.

Although the image is blurry, it can still be used for calibration. During the filming, the cat was very reluctant, so I had to ask my friend to hold the cat and face the camera.

Edge Impulse Preparation

  1. Go to Edge Impulse website and Login in my account. Because i have my account before, so i do not need to regeister a new one.

  2. Create the project and name it as “Crail’s Cat”.In fact, you can just give it a random name, and this name will later become the name of the downloaded folder of the model.
  3. Select the project details: Bounding boxes(objects detection) and Espressif ESP-EYE as devices. Otherwise the labeling and modeling will not feet my MCU board.

Uploading the raw image data or we can download from internet

Go to”Data Acqiosotion” and Upload all data files which i took before. I choose to upload all imgae together and choos folder.

In addition, there are many websites with free downloadable data sets that can be used for calibration learning and training. For example, the following websites.
https://www.kaggle.com/

Labeling the images/dataset

I have updated 125 images of my three cats and some no cat image. That is not a big dataset labeling work but still cost me 20 mintues to label it one by one. I didnt check if there have wrong labeling or not. I belive myself it will no mistakes. Let’s see the result later. Lol…



I needed to mark each face of the three cats with more than a hundred photos. Since I had no previous experience in marking models, it basically took me more than half an hour to mark the photos one by one. The names of my three cats are respectively:白开水(Baikaishui),吕66(Lyu six six which actualy is my fisr cat, so i give him my chines family name. lol…),蛋挞(Danta)so I used BKS, L66, and DT to replace the labels of the three cats, which will be simpler in the final display.

Balancing the dataset and split Train/Test

Because i also add some images with not cat. So it will better to adjust the traning images balance. We also need to ensure that the number of each calibration data set is roughly uniform, otherwise there will be some deviation in the calibration results.

The Impulse Design

Precrocessing all dataset/images

Here, you first need to select the input format of the data set, because edgeimpluase supports both image and audio, but this time I only select image and select the size format.

In addition, during model training, I chose object detection because the final result I want is to identify which cat my cat is. This is not a simple classification. Sometimes there may be multiple cats at the same time. Appears in the camera screen and needs to mark the results.

When the model is finally generated, a calibration result of the model data will be obtained. You can see that if most of the data are distributed together, it proves that my data calibration is quite successful. But obviously my calibration result this time was a failure, and a lot of data were interspersed together.

Model Design, Training, and Test

Adjust some lables to improve the acuracy.During model training, I relabeled some label images that did not fall in my final area. If they were still in this area, I decided to delete the image to improve the accuracy of my model training. Because I still collected too few yellow books, I will find that the final model training accuracy is still limited.

Finally, you can see that my training model can basically reach a confidence level of 73%. Based on my small data set, I feel that it is almost ready to continue to be used.

In order to once again confirm the training results generated by my model, I chose model testing to take a closer look at the results generated by my model.

Since there were some BKS and DT photos mixed together, I once again deleted some data images and labels that jumped within a large range. Unexpectedly, I only got a 50% accurate data model. That night I worked on modifying the training model until three o’clock in the morning. I was really sleepy. So I finally accepted the result.

Testing the model in live classification

It would be nice if I could test my model before downloading it for deployment. At this time, the live classification function of edge impluse is very useful. But because XIAO ESP32S3 currently does not support online connection, I can only use SCAN QR code to test using my mobile phone.

After I scanned the QR code with my mobile phone to log in, I used the images I collected before to test it. I found that the results were pretty good, and I could already identify the faces of different cats.
Finally, I also discovered a problem. For multiple cats in the same frame, the recognition success rate is not very high. In addition, recognition has certain requirements for the distance from the camera to the recognized object. I think the current recognition model still causes these two problems because the data set collected for model training is too small and there is no training and calibration for various situations.



Deploying the Model(Arduino IDE)

Because I was too sleepy that night, I chose to generate the model directly. After selecting the Arduino library file and cutting tool, I directly got a routine to download the library file.

After getting this routine, I first need to set the ESP32 to become the pin parameter setting of XIAO. This program takes a long time to compile. After the compilation and upload are successful. I’ve encountered a very difficult problem.
Although the program was uploaded successfully, the serial port monitor kept reporting that the camera initialization failed. I checked the camera parameter settings several times but could not find the problem.

Edge Impulse Inferencing Demo
Camera init failed with error 0xffffffff
ERR: Camera is not initialized


I finally discovered that it was because I did not select the option to start PSRAM in the Arduino IDE during the compilation process.
In Arduino development, PSRAM refers to external pseudo-static random access memory (Pseudo Static Random Access Memory). It is a hardware device used to expand memory capacity, usually on a processor or microcontroller.
The explaination of PSRAM in Arduino :
ESP32-S3 PSRAM: ESP32-S3 is a microcontroller with integrated Wi-Fi and Bluetooth capabilities. Some versions of it, such as the ESP32-WROVER, come with additional PSRAM for memory expansion. PSRAM typically has a capacity of several megabytes and can be used to store data, images, audio, etc.
How to use: If you use the ESP32-S3 development board in Arduino IDE, you can manually enable the PSRAM function. In this way, you can use the heap_caps_malloc function to allocate PSRAM memory in your code, thereby expanding the memory space of your application.
Finally, after successful startup, the program turned on the camera and started using my model for recognition.

Hero Shot & Summary

Finally, when I used the camera to identify the faces of my different cats, I could clearly see and print out the recognition results. In this way, I can combine it with the previous MQTT to monitor which cat eats more times every day. I plan to install it directly next to the cat’s food bowl to record the cat’s daily life.


At the end, I have to lament the iterative speed of current AI training tools. Through tools like edge impluse, every developer can quickly use and train the data sets they want. This makes our MCU embedded systems more Applications have new innovative possibilities.

Source Code Explanation

Workflow Summary
1. Initialization:
- Serial communication is set up.
- Camera is initialized (ei_camera_init()), ensuring correct pin configurations and sensor settings.
2. Main Loop:
- Continuously captures images (ei_camera_capture()).
- Runs classification (run_classifier()) on each captured image.
- Prints timing and classification results to the serial monitor.
3. Camera Functions:
- ei_camera_init(): Initializes the ESP32 camera with specified settings. - ei_camera_capture(): Captures an image from the camera, converts it to RGB format, and optionally resizes/crops it.
4. Data Handling:
- ei_camera_get_data(): Retrieves RGB data from the camera buffer in RGB888 format, handling any necessary conversions.
This workflow provides a structured approach to integrating camera capture and image classification using Edge Impulse with the ESP32 and ensures proper initialization and data handling throughout the process.

#include <Crail_s_Cat_inferencing.h>
#include "edge-impulse-sdk/dsp/image/image.hpp"
#include "esp_camera.h"

// Define the pin configurations for your specific camera
#define PWDN_GPIO_NUM -1
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 10
#define SIOD_GPIO_NUM 40
#define SIOC_GPIO_NUM 39
#define Y9_GPIO_NUM 48
#define Y8_GPIO_NUM 11
#define Y7_GPIO_NUM 12
#define Y6_GPIO_NUM 14
#define Y5_GPIO_NUM 16
#define Y4_GPIO_NUM 18
#define Y3_GPIO_NUM 17
#define Y2_GPIO_NUM 15
#define VSYNC_GPIO_NUM 38
#define HREF_GPIO_NUM 47
#define PCLK_GPIO_NUM 13

static camera_config_t camera_config = {
    .pin_pwdn = PWDN_GPIO_NUM,
    .pin_reset = RESET_GPIO_NUM,
    .pin_xclk = XCLK_GPIO_NUM,
    .pin_sscb_sda = SIOD_GPIO_NUM,
    .pin_sscb_scl = SIOC_GPIO_NUM,
    .pin_d7 = Y9_GPIO_NUM,
    .pin_d6 = Y8_GPIO_NUM,
    .pin_d5 = Y7_GPIO_NUM,
    .pin_d4 = Y6_GPIO_NUM,
    .pin_d3 = Y5_GPIO_NUM,
    .pin_d2 = Y4_GPIO_NUM,
    .pin_d1 = Y3_GPIO_NUM,
    .pin_d0 = Y2_GPIO_NUM,
    .pin_vsync = VSYNC_GPIO_NUM,
    .pin_href = HREF_GPIO_NUM,
    .pin_pclk = PCLK_GPIO_NUM,
    .xclk_freq_hz = 20000000,
    .ledc_timer = LEDC_TIMER_0,
    .ledc_channel = LEDC_CHANNEL_0,
    .pixel_format = PIXFORMAT_JPEG,
    .frame_size = FRAMESIZE_QVGA,
    .jpeg_quality = 12,
    .fb_count = 1,
    .fb_location = CAMERA_FB_IN_PSRAM,
    .grab_mode = CAMERA_GRAB_WHEN_EMPTY,
};

void setup() {
    Serial.begin(115200);
    while (!Serial);
    Serial.println("Edge Impulse Inferencing Demo");
    if (ei_camera_init() == false) {
        Serial.println("Failed to initialize Camera!");
    } else {
        Serial.println("Camera initialized");
    }
    Serial.println("Starting continuous inference in 2 seconds...");
    delay(2000);
}

void loop() {
    if (ei_sleep(5) != EI_IMPULSE_OK) {
        return;
    }

    snapshot_buf = (uint8_t*)malloc(EI_CAMERA_RAW_FRAME_BUFFER_COLS * EI_CAMERA_RAW_FRAME_BUFFER_ROWS * EI_CAMERA_FRAME_BYTE_SIZE);
    if (snapshot_buf == nullptr) {
        Serial.println("ERR: Failed to allocate snapshot buffer!");
        return;
    }

    ei::signal_t signal;
    signal.total_length = EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT;
    signal.get_data = &ei_camera_get_data;

    if (ei_camera_capture((size_t)EI_CLASSIFIER_INPUT_WIDTH, (size_t)EI_CLASSIFIER_INPUT_HEIGHT, snapshot_buf) == false) {
        Serial.println("Failed to capture image");
        free(snapshot_buf);
        return;
    }

    ei_impulse_result_t result = { 0 };
    EI_IMPULSE_ERROR err = run_classifier(&signal, &result, debug_nn);
    if (err != EI_IMPULSE_OK) {
        Serial.printf("ERR: Failed to run classifier (%d)\n", err);
        return;
    }

    Serial.printf("Predictions (DSP: %d ms., Classification: %d ms., Anomaly: %d ms.): \n",
                  result.timing.dsp, result.timing.classification, result.timing.anomaly);

    for (uint16_t i = 0; i < EI_CLASSIFIER_LABEL_COUNT; i++) {
        Serial.printf("  %s: %.5f\n", ei_classifier_inferencing_categories[i], result.classification[i].value);
    }

    free(snapshot_buf);
}

bool ei_camera_init(void) {
    if (is_initialised) return true;

    esp_err_t err = esp_camera_init(&camera_config);
    if (err != ESP_OK) {
        Serial.printf("Camera init failed with error 0x%x\n", err);
        return false;
    }

    sensor_t *s = esp_camera_sensor_get();
    if (s->id.PID == OV3660_PID) {
        s->set_vflip(s, 1);
        s->set_brightness(s, 1);
        s->set_saturation(s, 0);
    }

    is_initialised = true;
    return true;
}

void ei_camera_deinit(void) {
    esp_err_t err = esp_camera_deinit();
    if (err != ESP_OK) {
        Serial.println("Camera deinit failed");
    }
    is_initialised = false;
}

bool ei_camera_capture(uint32_t img_width, uint32_t img_height, uint8_t *out_buf) {
    if (!is_initialised) {
        Serial.println("ERR: Camera is not initialized");
        return false;
    }

    camera_fb_t *fb = esp_camera_fb_get();
    if (!fb) {
        Serial.println("Camera capture failed");
        return false;
    }

    bool converted = fmt2rgb888(fb->buf, fb->len, PIXFORMAT_JPEG, snapshot_buf);
    esp_camera_fb_return(fb);
    if (!converted) {
        Serial.println("Conversion failed");
        return false;
    }

    bool do_resize = (img_width != EI_CAMERA_RAW_FRAME_BUFFER_COLS) || (img_height != EI_CAMERA_RAW_FRAME_BUFFER_ROWS);
    if (do_resize) {
        ei::image::processing::crop_and_interpolate_rgb888(
            out_buf,
            EI_CAMERA_RAW_FRAME_BUFFER_COLS,
            EI_CAMERA_RAW_FRAME_BUFFER_ROWS,
            out_buf,
            img_width,
            img_height);
    }

    return true;
}

static int ei_camera_get_data(size_t offset, size_t length, float *out_ptr) {
    size_t pixel_ix = offset * 3;
    size_t pixels_left = length;
    size_t out_ptr_ix = 0;

    while (pixels_left != 0) {
        out_ptr[out_ptr_ix] = (snapshot_buf[pixel_ix + 2] << 16) + (snapshot_buf[pixel_ix + 1] << 8) + snapshot_buf[pixel_ix];
        out_ptr_ix++;
        pixel_ix += 3;
        pixels_left--;
    }
    return 0;
}

Source Code Downlaod

Here is the final trained program I generated for all platform not only esp32.
Download the code of Crail’s Cat Recognition