Skip to main content

(input) Mobile Module - Raw Audio Data to Words

info

Updated on 6.10th, I have now working on the input parts, metioned by Neil, on the presentation day. I decided not using SD card to transmit the date and make more easy way - using Wi-Fi, which I have learnt quite OK.

The basic software idea:

From the knowledge from Week 12, I understood there are some issues about using I2S with MicroPython on XIAO ESP32C3. Thus my input progress has to be like this:

  1. XIAO ESP32C3 with INMP441 running Arduino, and sending audio data(audio frame) to under one IP address, like 192.168.66.123.
  2. (optional)Another XIAO ESP32C3 with RGB running MicroPython, converting audio frame to a WAV file and then transmitting it to the reComputer through , flashing RGB when receiving.
  3. reCOmputer can do two options:
    • Reading data from the fisrt one, and computing the audio frame and converting to words(text) and sending them to the LLM.
    • Receiving file from the second one, and reading it as words and putting them in the LLM.

Software Part 1: Output audio raw data to audio frame

The audio data from week12 are all number(raw format) and need to be converted to the audio frame and have to be a suitable format.

And the specific format depends on the requirements of the speech recognition system I am using.

Here are common ones:

  • WAV (Waveform Audio File Format): A popular uncompressed audio format that stores audio data along with metadata.
  • PCM (Pulse-Code Modulation): A raw audio format that represents the amplitude of the audio signal at each sample point.
  • FLAC (Free Lossless Audio Codec): A lossless compressed audio format that reduces file size while preserving audio quality.
info

The ESP32C3 is not powerful enough to convert the data to the audio frame

What I can use this mobile module do is packing the raw data from the INMP441 and applying XIAO ESP32C3 to send them out, using Wi-Fi.

This is the code for packaging and printing on the serial port:

/*
ESP32 I2S Microphone Sample
esp32-i2s-mic-sample.ino
Sample sound from I2S microphone, display on Serial Monitor
Requires INMP441 I2S microphone

DroneBot Workshop 2022
https://dronebotworkshop.com
*/

// Include I2S driver
#include <driver/i2s.h>

// Connections to INMP441 I2S microphone
#define I2S_WS 9
#define I2S_SD 10
#define I2S_SCK 8

// Use I2S Processor 0
#define I2S_PORT I2S_NUM_0

// Define input buffer length
#define bufferLen 64
int16_t sBuffer[bufferLen];

void i2s_install() {
// Set up I2S Processor configuration
const i2s_config_t i2s_config = {
.mode = i2s_mode_t(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = 44100,
.bits_per_sample = i2s_bits_per_sample_t(16),
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_STAND_I2S),
.intr_alloc_flags = 0,
.dma_buf_count = 8,
.dma_buf_len = bufferLen,
.use_apll = false
};

i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
}

void i2s_setpin() {
// Set I2S pin configuration
const i2s_pin_config_t pin_config = {
.bck_io_num = I2S_SCK,
.ws_io_num = I2S_WS,
.data_out_num = -1,
.data_in_num = I2S_SD
};

i2s_set_pin(I2S_PORT, &pin_config);
}

void setup() {

// Set up Serial Monitor
Serial.begin(115200);
Serial.println("ESP32 I2S Microphone Test");

delay(1000);

// Set up I2S
i2s_install();
i2s_setpin();
i2s_start(I2S_PORT);

delay(500);
}

void loop() {
// Get I2S data and place in data buffer
size_t bytesIn = 0;
esp_err_t result = i2s_read(I2S_PORT, &sBuffer, bufferLen, &bytesIn, portMAX_DELAY);

if (result == ESP_OK) {
// Read I2S data buffer
int16_t samples_read = bytesIn / 8;
if (samples_read > 0) {
// Print audio frame data to Serial Monitor
Serial.print("Audio Frame: ");
for (int16_t i = 0; i < samples_read; ++i) {
Serial.print(sBuffer[i]);
Serial.print(" ");
}
Serial.println();
}
}
}

There is the output:

Software Part 2: Audio data(packagde) streaming from mobile module to reComputer

I am using Wi-Fi to apply this function. These parts should be under the same network and here is one reference for connecting XIAO ESP32C3 to Wi-Fi and one reference from above.

Some parameters changable:

  • bufferLen: Buffer to store audio data read from the microphone, here is 64.
  • sample rate 44100 Hz, which is the default value in pyaudio.
  • bits per sample 16, which corresponds to the paInt16 format in pyaudio.

Conecting Wi-Fi and Sending data to its IP

For integrating these two reference, I do some changes about the code(powered by GPT of course):

  1. The connection is still the same:

    #define I2S_WS 9
    #define I2S_SD 10
    #define I2S_SCK 8
  2. After successfully connecting to Wi-Fi, the IP address of the ESP32C3 MCU board is printed using:

    Serial.println(WiFi.localIP());
  3. A WiFiServer object named server is created, specifying the port number (12345 in this example) on which the server will listen for incoming connections.

  4. In the setup() function, the server is started using:

    server.begin();

    And a message is printed to indicate that the server has started.

  5. In the loop() function, the code checks if a client has connected using:

    server.available();

    If a client is connected, it prints a message indicating that a client has connected.

  6. The I2S audio data is read and stored in the sBuffer as before.

  7. If audio samples are successfully read, they are sent to the connected client using:

    client.write((uint8_t*)sBuffer, bytesIn);
  8. After sending the data, the client connection is closed using:

    client.stop();

    And a message is printed to indicate that the client has disconnected.

The full code:

#include <WiFi.h>
#include <driver/i2s.h>

const char* ssid = "fros_wifi";
const char* password = "66668888";

// Connections to INMP441 I2S microphone
#define I2S_WS 9
#define I2S_SD 10
#define I2S_SCK 8

// Use I2S Processor 0
#define I2S_PORT I2S_NUM_0

// Define input buffer length
#define bufferLen 64
int16_t sBuffer[bufferLen];

// Server configuration
WiFiServer server(12345);

void i2s_install() {
// Set up I2S Processor configuration
const i2s_config_t i2s_config = {
.mode = i2s_mode_t(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = 44100,
.bits_per_sample = i2s_bits_per_sample_t(16),
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_STAND_I2S),
.intr_alloc_flags = 0,
.dma_buf_count = 8,
.dma_buf_len = bufferLen,
.use_apll = false
};

i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
}

void i2s_setpin() {
// Set I2S pin configuration
const i2s_pin_config_t pin_config = {
.bck_io_num = I2S_SCK,
.ws_io_num = I2S_WS,
.data_out_num = -1,
.data_in_num = I2S_SD
};

i2s_set_pin(I2S_PORT, &pin_config);
}

void setup() {
// Set up Serial Monitor
Serial.begin(115200);
Serial.println("ESP32 I2S Microphone Test");

delay(1000);

// Set up I2S
i2s_install();
i2s_setpin();
i2s_start(I2S_PORT);

delay(500);

// Connect to Wi-Fi
Serial.println();
Serial.println();
Serial.print("Connecting to ");
Serial.println(ssid);

WiFi.begin(ssid, password);

while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}

Serial.println("");
Serial.println("WiFi connected");

// Print the IP address of the ESP32C3 MCU board
Serial.println("IP address: ");
Serial.println(WiFi.localIP());

// Start the server
server.begin();
Serial.println("Server started");
}

void loop() {
// Check if a client has connected
WiFiClient client = server.available();
if (client) {
Serial.println("Client connected");

// Get I2S data and place in data buffer
size_t bytesIn = 0;
esp_err_t result = i2s_read(I2S_PORT, &sBuffer, bufferLen, &bytesIn, portMAX_DELAY);

if (result == ESP_OK) {
// Read I2S data buffer
int16_t samples_read = bytesIn / 8;
if (samples_read > 0) {
// Send the audio data to the connected client
client.write((uint8_t*)sBuffer, bytesIn);
}
}

// Close the connection
client.stop();
Serial.println("Client disconnected");
}
}

Connecting and checking its IP:

Reading the data from its IP

Now I need to write a script to read the data from 192.168.66.117, continuously.

  1. I create a new server socket and establish a connection to the ESP32C3 MCU board using server_socket.connect((SERVER_IP, SERVER_PORT)), which is 192.168.66.117, and 12345.
  2. Once connected, the script enters an inner while True loop where it continuously receives data from the ESP32C3 MCU board using server_socket.recv(1024).
  3. The received data is printed as hexadecimal values using data.hex().

It is powered by GPT and it helps me consider all the conditions.

Here is the full code:

import socket

# Set up the server socket
SERVER_IP = '192.168.66.117' # IP address of the ESP32C3 MCU board
SERVER_PORT = 12345 # Choose a port number

while True:
try:
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.connect((SERVER_IP, SERVER_PORT))
print(f"Connected to {SERVER_IP}:{SERVER_PORT}")

while True:
# Receive data from the ESP32C3 MCU board
data = server_socket.recv(1024)
if not data:
break

# Process the received data
# Here, we print the raw data as hexadecimal values
print("Received data:", data.hex())

# Close the server socket
server_socket.close()
print("Disconnected from the server")

except ConnectionRefusedError:
print("Connection refused. Retrying...")
except ConnectionResetError:
print("Connection reset. Retrying...")
except KeyboardInterrupt:
print("Keyboard interrupt received. Exiting...")
break

# Wait for a short interval before retrying
socket.timeout(1)

and the output formate like(customized frame):

import socket

# Set up the server socket
SERVER_IP = '192.168.66.117' # IP address of the ESP32C3 MCU board
SERVER_PORT = 12345 # Choose a port number

while True:
try:
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.connect((SERVER_IP, SERVER_PORT))
print(f"Connected to {SERVER_IP}:{SERVER_PORT}")

while True:
# Receive data from the ESP32C3 MCU board
data = server_socket.recv(1024)
if not data:
break

# Process the received data
samples_read = len(data) // 2 # Each sample is 2 bytes (16-bit)
audio_frame = [int.from_bytes(data[i:i+2], byteorder='little', signed=True) for i in range(0, len(data), 2)]

# Print audio frame data
print("Audio Frame:", end=" ")
for sample in audio_frame:
print(sample, end=" ")
print()

# Close the server socket
server_socket.close()
print("Disconnected from the server")

except ConnectionRefusedError:
print("Connection refused. Retrying...")
except ConnectionResetError:
print("Connection reset. Retrying...")
except KeyboardInterrupt:
print("Keyboard interrupt received. Exiting...")
break

# Wait for a short interval before retrying
socket.timeout(1)

The final output is:

Software Part 3: Raw audio data to text on reComputer

pip install requests
pip install --upgrade google-cloud-speech

python 3.11 required

(Optional) I might need to do the resampling and noise reduction.

For applying locally, I will have

There should be something considered:

  1. A button, when press a button and it starts to record, otherwise it will keep input the data.
  2. The start and end phase: sending the message that when it is begun and when it is ended.
  3. Maybe 3 second, there is a button connecting to GPIO2 port. Push the button and recording 3 second and packaging the data to send.

Previous

info

Updated on 6.1st, since it is kind of out of time. Still busy lately... I am just presenting the module how is normal used. There are several wikis I can refer to later:

Main board Design

pressing the button and starting voice recognisation, the RGB light shows the microphone working, and when the board detects it, the board will "data transfer" to another MCU board

I want to ensure when I push the button there will be display feedback, and orgini

Voice Recognization

These two wiki can be refer to:

Audio files store