K
I
N
G
E
R

Final project

Kinda Intelligent, Not so Great Electronic Roommate

KINGER

Based on the character Kinger from The Amazing Digital Circus by Glitch Studios.
KINGER is a modular, voice-enabled intelligent assistant that combines embedded systems and local AI inference tools.
Kinger Sketch
Inspired by commercial smart speakers, KINGER is configured as an open development environment that lets users run high-level LLMs locally on a credit-card-sized computer, completely detached from cloud dependencies.

Kinger Sketch

Project Objectives

The goal of KINGER is to integrate voice input, local artificial intelligence, audio output, physical controls, visual feedback, custom electronics, and a 3D printed mechanical enclosure into a single standalone desktop companion. The system was designed to operate entirely on local hardware, without dependence on cloud services for speech recognition, language inference, or speech synthesis.

  • Develop a completely self-hosted physical voice assistant using a standard Linux Single Board Computer.
  • Implement low-latency local execution of Large Language Models (LLMs) and acoustic processing blocks.
  • Isolate high-level applications from raw input/output tasking through a dedicated peripheral co-processing unit.
  • Build an optimized spatial PCB distribution array using vertical interconnect strategies to maximize a small-footprint form factor.

The planned hardware platform consists of:

  • Raspberry Pi 5 — main computing unit, runs speech recognition, AI inference, and speech synthesis.
  • INMP441 I2S microphones — digital audio capture.
  • MAX98357A I2S amplifier — audio output stage.
  • 3W 4Ω speaker — audio output transducer.
  • XIAO ESP32-C6 — peripheral co-processor for LEDs and buttons.
  • 16-pixel NeoPixel ring — visual feedback.
  • Physical buttons — volume control and mode switching.
  • Power distribution system — 5V supply for the Raspberry Pi and peripheral electronics.
  • Custom PCBs — designed and milled in-house (see PCB Design Strategy).
  • 3D printed enclosure — houses all internal components.

Schedule & Interactive Timeline

April 15 — Mid Term DONE
Create a Schedule

Organize the project timeline and weekly objectives.

CAD Design Explanation

Explain the CAD workflow, initial dimensions, and intended assembly process.

April 17 — Component Arrival DONE
Audio Hardware Logistics

Arrival of the digital INMP441 I2S microphones and the MAX98357A Class D I2S amplifier for testing.

April 24 — High-Level Brain Delivery DONE
Raspberry Pi 5 Arrival

Receipt of the main computing unit. Started initial system configuration on Raspberry Pi OS.

April 30 — Interface Schematics DONE
XIAO Wiring Diagram

Designed the complete electronic connection diagram for the Seeed Studio XIAO module in KiCad.

May 13 — Peripheral Prototyping DONE
Isolated XIAO Setup

Independent firmware implementation for reading buttons and controlling the 16-pixel NeoPixel ring animations.

May 22 — Local AI Deployment DONE
Ollama Framework Install

Installed Ollama on the Raspberry Pi 5. Tested different model variants to verify local compatibility, standardizing on qwen2.5:3b.

May 27 — Media Deliverables DONE
Video & Slide Upload

Upload and validation of the final project explanatory video and the project summary slide.

May 28-31 — Voice Pipeline Integration DONE
Audio Loops & Scripts

Completed full pipeline chaining: arecord → Faster-Whisper (STT) → commands.py check → Ollama (Qwen) → Piper (TTS) → aplay output.

June 01 — Communication Layer & 3D Tweaks DONE
MQTT Communication Layer

Replaced the originally planned UART link between the Raspberry Pi and the XIAO ESP32-C6 with an MQTT-based communication layer over WiFi, exchanging high-level system states and button events.

3D Enclosure Modeling

Finalize structural modifications to the 3D model shell before physical printing.

June 02 — Fabrication Print DONE
Structural Printing

Exported components to slicing software and ran the multi-hour 3D print process for KINGER's body layout.

June 03 — Project Wrap-Up DONE
Assembly & Quality Check

Stack custom PCBs inside the printed enclosure, fit structural mounts, align hardware controls, and run system validations.



System Architecture

KINGER separates time-critical I/O and user interaction from heavy AI inference by dividing the system into two cooperating layers: a high-level processing unit running on the Raspberry Pi 5, and a low-level peripheral controller running on the XIAO ESP32-C6.

High-Level Processing Unit (Raspberry Pi 5)

Acts as the central logic core of the system. It captures raw PCM audio streams from the I2S microphone bus, runs the local speech recognition and language model pipeline, evaluates the conversational stack, and renders the synthesized voice response in real time.

Low-Level Control Unit (XIAO ESP32-C6)

Acts as a physical peripheral co-processor. It is responsible for the LED ring, the physical buttons, and connects to the system over WiFi/MQTT to receive system state updates and publish user-generated button events.

Item Details Qty Unit Price ($) Source / Link
Raspberry Pi 5 Main AI processing unit 1 $95 Official Raspberry Pi distributor
XIAO ESP32-C6 Peripheral control microcontroller 1 $8 Seeed Studio
INMP441 Digital I2S microphones 2 $4 Amazon / AliExpress
MAX98357A I2S audio amplifier 1 $5 Amazon / AliExpress
Speaker Audio output system 1 $6 Electronics supplier
NeoPixel Ring 16 RGB LEDs feedback system 1 $10 Adafruit / Amazon
Push Buttons Volume and mode controls 3 $0.50 Local electronics store
Battery Pack NeoPixel and speaker power 1 $12 Amazon
3D Printed Parts External shell and modular structure 1 set $20 Fab Lab Puebla

Voice Processing Workflow

The complete voice pipeline executed on the Raspberry Pi follows the flow below:

Audio Capture

Stereo audio is captured from the INMP441 I2S microphones using arecord at 48 kHz, 32-bit, and converted to mono using sox for transcription.

Speech-to-Text

Faster-Whisper (tiny model, CPU, int8 quantization) transcribes the mono audio to Spanish text.

Command Analysis

The transcribed text is passed to commands.py. If a recognized command pattern is found (e.g. time, date), it is resolved directly and a response is generated immediately.

AI Response (if not a command)

If no command matches, the text is forwarded to brain.py, which queries Ollama running the Qwen 2.5 3B model with the full conversation history.

Text-to-Speech

The resulting response text is converted to a WAV file using Piper (es_MX-ald-medium voice model).

Audio Output

The WAV file is played back through the MAX98357A I2S amplifier to the speaker using aplay.

Conversational memory is handled by brain.py, which maintains an in-memory list of conversation turns (system prompt, user messages, and assistant responses) that is sent to Ollama on every call. This allows Qwen 2.5 3B to generate responses with context from previous exchanges in the same session.

Communication Architecture

The original system architecture considered a UART connection between the Raspberry Pi and the XIAO peripheral controller, with the Pi sending state strings (IDLE, LISTENING, THINKING, SPEAKING) and the XIAO returning button events over a serial link at 115200 bps.

During integration testing, reliability and synchronization issues appeared when handling simultaneous audio processing on the Raspberry Pi and interface events on the XIAO. The serial link required both devices to remain tightly synchronized in real time, and the audio capture/playback load on the Raspberry Pi made consistent UART servicing difficult to guarantee.

To improve modularity, scalability, and maintainability, the architecture evolved toward an MQTT-based communication layer over WiFi. This decouples the two subsystems in time: each device publishes and subscribes to topics independently, without requiring a synchronous serial handshake.

Raspberry Pi → XIAO (kinger/state)

The Raspberry Pi publishes high-level system states to the MQTT broker:

  • IDLE
  • LISTENING
  • THINKING
  • SPEAKING
  • MINECRAFT

XIAO → Raspberry Pi (xiao/boton)

The XIAO publishes numeric event codes corresponding to button presses and mode changes:

  • 1 — Volume up
  • 2 — Volume down
  • 3 — Thinking mode
  • 4 — Gaming mode
  • 5 — Listening mode
  • 6 — Error mode

Only high-level states and discrete events are exchanged between the two subsystems. This reduces coupling: the Raspberry Pi does not need to know how the LED ring renders a given state, and the XIAO does not need to know how the AI pipeline produces a response. Each device can be updated, restarted, or reflashed independently as long as the MQTT topic contract is respected.

Hardware Integration Layers

Audio capture and output avoid the noise and complexity of analog conversion stages by using the I2S digital bus. The Raspberry Pi 5 acts as the bus master, providing the bit clock and word select lines:

  • Digital Capture (INMP441): Transmits serialized audio data over GPIO pins configured for I2S. Uses three signals: bit clock (SCK), word select (WS), and serial data (SD).
  • Power Amplification (MAX98357A): A high-efficiency Class D amplifier with an integrated DAC that decodes I2S audio directly from dedicated Pi pins (GPIO18 → BCLK, GPIO19 → LRCLK, GPIO21 → DIN) and delivers up to 3W to a 4Ω speaker.

XIAO ESP32-C6 Firmware

The XIAO ESP32-C6 firmware is responsible for the LED ring and physical interaction. On boot, it follows a fixed initialization sequence:

Boot Sequence

Power On → NeoPixel ring initialization (16 pixels, brightness 60) → WiFi connection (blocking, retries until connected) → MQTT broker connection:broker.emqx.io randomized client ID → Subscribe to xiao/boton → Enter main loop, default mode: LISTENING

For more information visit week11.

The firmware does not run any AI or speech processing. Its only responsibilities are reading incoming MQTT messages and driving the NeoPixel ring according to the current mode. Five animation modes are implemented:

MODE_LISTENING

Default mode. A breathing animation cycles the brightness of all 16 pixels in a blue tone, increasing and decreasing brightness in steps of 4 every 20 ms.

MODE_THINKING

A four-pixel comet trail rotates around the ring every 50 ms, with decreasing blue intensity along the trailing pixels, indicating that the AI is processing a response.

MODE_ERROR

All 16 pixels blink red, toggling fully on and off every 200 ms, used to indicate an error or fault state.

MODE_GAMING

A continuous rainbow animation using HSV color space, where the hue offset of each pixel is advanced by 350 every 15 ms, used for the "Minecraft mode" visual identity.

MODE_VOLUME

A volume-bar animation: the number of lit pixels (in blue) corresponds to the current volume level (0–16). After 5 seconds with no further volume events, the ring automatically returns to the previous mode.

Mode transitions are driven entirely by single-character MQTT messages received on xiao/boton. Codes 1 and 2 adjust the volume level and temporarily switch to MODE_VOLUME, after which the firmware restores the previously active mode (previousMode). Codes 3 through 6 directly switch between Thinking, Gaming, Listening, and Error modes.

Volume State Restoration

When a volume event is received currentMode is set to MODE_VOLUME and a timestamp volumeStartTime is recorded. On every loop iteration, the firmware checks whether more than 5000 ms have elapsed since that timestamp; if so currentMode is reset to previousMode restoring the LED ring to whatever animation was active before the volume change (e.g. Listening or Thinking).

Software Pipeline & Deployment

To run the complete conversational engine of KINGER locally on the Raspberry Pi 5 under Raspberry Pi OS, the following deployment steps are followed.

1. Isolated Environment Setup

Due to the externally-managed-environment policy of modern Python distributions on Linux, a virtual environment ( venv) is required to avoid conflicts with system-wide packages:

# Create the project directory and required subfolders
mkdir -p ~/kinger_ai/voices ~/kinger_ai/models
cd ~/kinger_ai

# Create the isolated Python virtual environment
python3 -m venv venv

# Activate the virtual environment (must be done before any install or script run)
source venv/bin/activate

2. Installing Critical Dependencies

With the environment active, the speech pipeline layers and system utilities are installed:

# Install system utilities for audio capture and manipulation
sudo apt-get update && sudo apt-get install -y alsa-utils sox python3-pip

# Install AI engines inside the active venv
pip install --upgrade pip
pip install faster-whisper piper-tts ollama paho-mqtt

3. Deploying Ollama and Qwen

Ollama manages the LLM weights and accelerates inference on ARM64 architectures. The system daemon is installed and the quantized Qwen model is downloaded:

# Download and install the Ollama engine binary
curl -fsSL https://ollama.com/install.sh | sh

# Verify the service is running in the background, or start it manually:
# ollama serve

# Download the Qwen 2.5 3B parameter model locally
ollama pull qwen2.5:3b

4. Python Module Structure

The system logic is split into dedicated modules coordinated by a main dispatcher. The implementations below reflect the actual code running on KINGER:

brain.py

Maintains the conversation history in a structured in-memory list and sends chat requests to the local Ollama daemon.

from ollama import chat

conversation = [
    {"role": "system", "content": "Tu nombre es KINGER, eres el asistente de voz físico del cuarto. Responde conciso."}
]

def ask_ai(question):
    conversation.append({"role": "user", "content": question})
    response = chat(model="qwen2.5:3b", messages=conversation)
    answer = response["message"]["content"]
    conversation.append({"role": "assistant", "content": answer})
    return answer
                        
speak.py

Takes plain text, synthesizes it using Piper (Mexican Spanish voice model), and plays the resulting WAV file directly to the I2S amplifier through ALSA.

import os

def speak(text):
    clean_text = text.replace('"', '').replace('\n', ' ')
    # Make sure the ONNX model has been downloaded to the correct path
    model_path = "~/kinger_ai/voices/es_MX-ald-medium.onnx"
    
    # Pipe text into Piper and write the resulting WAV
    os.system(f'echo "{clean_text}" | piper --model {model_path} --output_file response.wav')
    
    # Direct playback to I2S hardware via ALSA
    os.system('aplay -D plughw:0,0 response.wav')
                        
commands.py

Pre-LLM interception filter. If the transcribed text matches a known pattern, an immediate response is returned without calling Ollama.

import datetime

def check_commands(text):
    text_clean = text.lower().strip()
    if "hora" in text_clean:
        now = datetime.datetime.now().strftime("%H:%M")
        return f"Son las {now}."
    if "fecha" in text_clean:
        today = datetime.datetime.now().strftime("%A %d de %B")
        return f"Hoy es {today}."
    return None # Not a command, must be handled by Qwen
                        
mqtt.py

Wraps the paho-mqtt client. Connects to the public broker, subscribes to kinger/event for incoming XIAO events, and exposes shortcut functions used by main.py to publish system states to kinger/state.

import paho.mqtt.client as mqtt

BROKER = "broker.emqx.io"
PORT = 1883
TOPIC_STATE = "kinger/state"
TOPIC_EVENT = "kinger/event"

client = mqtt.Client()
last_event = None

def on_connect(client, userdata, flags, rc):
    print(f"[MQTT] Connected ({rc})")
    client.subscribe(TOPIC_EVENT)

def on_message(client, userdata, msg):
    global last_event
    try:
        payload = msg.payload.decode().strip()
        last_event = payload
        print(f"[MQTT EVENT] {payload}")
    except Exception as e:
        print(f"[MQTT ERROR] {e}")

client.on_connect = on_connect
client.on_message = on_message

client.connect(BROKER, PORT, 60)
client.loop_start()

def send_state(state):
    client.publish(TOPIC_STATE, state)
    print(f"[MQTT TX] {state}")

def set_idle(): send_state("IDLE")
def set_listening(): send_state("LISTENING")
def set_thinking(): send_state("THINKING")
def set_speaking(): send_state("SPEAKING")
def set_minecraft(): send_state("MINECRAFT")

def get_event():
    global last_event
    event = last_event
    last_event = None
    return event
                        
main.py (Central Dispatcher)

The continuous loop that ties together I2S audio capture, Faster-Whisper transcription, command evaluation, Ollama inference, MQTT state updates, and the final audio output stage.

import os
from faster_whisper import WhisperModel
import brain
import speak
import commands
from mqtt import (
    set_idle,
    set_listening,
    set_thinking,
    set_speaking,
    get_event
)

# Initialize quantized Whisper model for efficient execution on ARM CPU
stt_model = WhisperModel("tiny", device="cpu", compute_type="int8")

def run_pipeline():
    print("Capturando audio por I2S...")
    # 5-second stereo recording at the hardware's native 48kHz
    os.system('arecord -D hw:0,0 -f S32_LE -r 48000 -c 2 -d 5 raw.wav')
    # Convert to mono for optimal Faster-Whisper processing
    os.system('sox raw.wav mono.wav remix 1')
    
    # Local transcription
    segments, _ = stt_model.transcribe("mono.wav", language="es")
    prompt = "".join([seg.text for seg in segments]).strip()
    
    if not prompt:
        print("No se detectó audio inteligible.")
        return

    print(f"Transcripción: {prompt}")
    
    # 1. Check for fast control commands
    response = commands.check_commands(prompt)
    
    # 2. If not a command, send to the local LLM
    if response is None:
        response = brain.ask_ai(prompt)
        
    print(f"Respuesta de KINGER: {response}")
    
    # 3. Synthesis and physical output to the speaker
    speak.speak(response)

if __name__ == "__main__":
    # KINGER's main operating loop
    while True:
        # MQTT events from the XIAO button interface are integrated here
        run_pipeline()
                        

PCB Design Strategy

To consolidate KINGER's electronics inside a compact cylindrical chassis, a modular system was structured around the vertical stacking of three independent boards, milled in-house on the SRM-20. The full schematic design, PCB layout, routing decisions, 3D verification, and manufacturing workflow (KiCad → Gerber export → toolpath generation in MODS → MonoFab milling → soldering) are documented in detail on the Week 15 page.

PCB 1 — Raspberry Pi Interface Board

The top board. Connects directly to the Raspberry Pi 5's 40-pin header. Its main purpose is to route the I2S audio bus cleanly between the header and the audio components below, keeping these traces short and isolated from other signal lines.

PCB 2 — Common Distribution Board

The central power board. Centralizes the common ground reference and 5V/3.3V power distribution between PCB 1 and PCB 3, avoiding redundant wiring and reducing the number of separate power connections inside the enclosure.

PCB 3 — XIAO Interface Board

The bottom board. Hosts the XIAO ESP32-C6 module and provides connection points for the NeoPixel ring data line and the physical buttons.

The three boards are assembled as a vertical stack using spacers and screws, reducing the internal volume occupied by the electronics and centralizing all interconnections in a single assembly that can be removed or serviced as a unit.

Initial Testing

3D Design

Body Design

For the main body structure, the design was inspired by the lower section of an hourglass. The first step was creating two circular sketches that define the main proportions of the enclosure: a bottom circle with a diameter of 120 mm and an upper circle with a diameter of 104 mm. This difference in dimensions creates the characteristic tapered shape while providing enough internal space for the electronic components.
Fab termi
Continuing with the mechanical design, the lower section was planned to house the main processing unit and the custom electronic boards. Since the Raspberry Pi is the most critical component inside the system, it was necessary to secure it directly to the base to prevent cable disconnections caused by movement or accidental impacts. To achieve this, mounting supports were extruded according to the exact position of the mounting holes located on the Raspberry Pi board, allowing it to be fixed using screws and ensuring a stable mechanical connection.
Fab termi
Another important structural element is the connection between the body and the head section. For this interface, a circular cut was created to form a 4 mm thick mounting disk. From this base, four cylindrical supports were extruded to act as connection points between both parts. These supports were designed with an inner diameter of 3 mm and an external diameter of 5 mm, allowing enough space for screw fastening while maintaining structural strength. Additionally, fillets were applied to the supports to improve stress distribution and reduce the concentration of forces during assembly or mechanical impacts.
Fab termi
Finally, to connect both circular sections and create the curved external shape of the body, two additional sketches were created: one positioned on the upper section and another on the lower section. In each sketch, rectangular profiles were drawn to define the transition geometry between both parts. The Sweep feature was then used, selecting both profiles as references, generating the continuous curved surfaces that complete the final body shape of the enclosure. This approach allowed the design to maintain the hourglass-inspired appearance while optimizing the available internal volume for the electronic integration.
Fab termi
Fab termi
We start by creating a revolved profile, aiming to leave a hollow shape that can be filled internally.
Fab termi
Next come the details: a cross must be added to the top of the head to emulate the character's silhouette, and the edges are softened using the fillet tool.
Fab termi
An extruded plane is created at 88 mm, aiming to stay as close as possible to the upper limit.
Fab termi
From here, a sketch is needed where two circles will be drawn and extruded, creating structural supports to screw the button PCB in place. Using that same sketch, slots are also created aligned with the buttons on the PCB, allowing them to be pressed in the future. This feature is patterned around the center of the head so that all four buttons end up correctly aligned.
Fab termi
Fab termi
Fab termi
Cylinders are created running from the inner face through to the outer face, in order to house the screws and washers that hold the eyes in place using magnets.
Fab termi
To join the head to the body, interlocking features with matching diameters are required on both parts. In this case the diameter is 87 mm. These are extruded and arranged in a circular pattern every 90°, resulting in four connectors total.
Fab termi
For ventilation, holes are made on the back of the head. To give it personality, the hole pattern is inspired by the Braille spelling of KINGER:

K
I
N
G
E
R
Fab termi
Four additional connectors are created to screw the 4 Ω speaker in place. Its mounting holes are also spaced 90° apart, resulting in four mounting points total.
Fab termi
Fab termi
Finally, a cutting template is created to assist with the cape fabric. The outline is downloaded from the internet and extruded 3 mm, then exported as a DXF file for cutting.
Fab termi
Export

Once the template geometry is complete, save it as .dxf for use with a laser cutter or vinyl plotter.

PrusaSlicer

To prepare the 3D models for fabrication, the STL files were processed using PrusaSlicer, the slicing software provided by Prusa Research. This software converts 3D models into machine instructions (G-code) that can be interpreted by the printer.
Fab termi

1- Selecting the Printer

Once the application is open, the first step is to select or add a printer
In this project, the Original Prusa MK4S with a 0.4 mm nozzle was used.
To select the printer:
  • Navigate to the top toolbar.
  • Locate the Printer dropdown menu.
  • Select Original Prusa MK4S.
Using the correct printer profile ensures that the generated G-code matches the capabilities and dimensions of the target machine.
Fab termi

2- Importing the STL File

After setting up the printer:
  • Click File → Import → Import STL.
  • The model will automatically appear on the virtual build plate.
At this stage, the object can be moved, rotated, scaled, or duplicated if required.
Fab termi

3- Workspace

The main workspace provides several tools used to prepare the model for printing. Visible menus and functions include:
  • Move Tool: Repositions the model on the build plate.
  • Rotate Tool: Adjusts the orientation of the object.
  • Scale Tool: Changes the dimensions of the model.
  • Cut Tool: Splits the model into separate sections.
  • Support Settings: Allows support generation for overhanging features.
  • Object List: Displays all loaded models.
  • Print Settings Panel: Contains quality and slicing parameters.
  • Filament Settings Panel: Defines material-specific parameters.
  • Printer Settings Panel: Contains machine configuration options.
  • 3D Viewport: Provides a real-time visualization of the build plate and model placement.
These tools allow the user to optimize the model before generating the final toolpath.

Parameters that I use:

Parameter Value
Material PLA
Nozzle Diameter 0.4 mm
Layer Height 0.20 mm
Nozzle Temperature 215 °C
Bed Temperature 60 °C
Infill 15–20%
Perimeters 2–3
Supports As Required
Cooling Fan Enabled
Print Speed Default MK4S Profile

4- Slicing and Preview

Once the parameters are configured:
  • Click the Slice Now button.
  • PrusaSlicer generates the toolpath and calculates:
After slicing, the Preview Mode can be used to inspect the generated paths.
The preview allows verification of:
  • Layer distribution.
  • Support structures.
  • Infill patterns.
  • Potential printing issues.
  • Estimated print duration.
Estimated print duration.
Fab termi
For this project, an Original Prusa MK4S 3D printer was used.
Fab termi

Loading the G-code

After generating the G-code file:
  • Save the file to the USB drive provided with the printer.
  • Safely eject the USB drive from the computer.
  • Insert the USB drive into the USB port of the MK4S.
The printer automatically detects the storage device and makes the file available through its interface.
Fab termi
Fab termi

Final Integrated System

After completing the enclosure and confirming that the AI pipeline was operating correctly with Faster-Whisper for speech recognition and Piper for speech synthesis, a decision was made to replace the original voice. Since the final Fab Academy presentation required a voice that more closely matched the character from the series, a voice model based on Kinger was selected from Fish Audio.

The objective was to preserve the existing architecture while replacing only the speech synthesis stage. The new voice model was integrated into the response pipeline so that every answer generated by the language model would be spoken using the custom Kinger voice instead of the default Piper voice.

fish_tts.py (Fish Audio Voice Synthesis Module)

This module sends the generated response to Fish Audio, downloads the synthesized speech, and automatically plays it through KINGER's speaker system.

from fish_audio_sdk import Session
from fish_audio_sdk import TTSRequest
import tempfile
import subprocess

API_KEY = "YOUR_API_KEY"

session = Session(API_KEY)

VOICE_ID = "KINGER_VOICE_ID"

def speak(text):

    with tempfile.NamedTemporaryFile(
        suffix=".mp3",
        delete=False
    ) as f:

        audio_path = f.name

    with open(audio_path, "wb") as file:

        for chunk in session.tts(
            TTSRequest(
                text=text,
                reference_id=VOICE_ID
            )
        ):
            file.write(chunk)

    subprocess.run([
        "ffplay",
        "-nodisp",
        "-autoexit",
        audio_path
    ])
    
main.py Modification

The original Piper call is replaced with the Fish Audio synthesis function while preserving the rest of the processing pipeline.

# Original

import speak

...

speak.speak(response)

# New version

from fish_tts import speak

...

speak(response)
    

This modification significantly improved character consistency during demonstrations by providing a voice that more closely resembled the original Kinger personality while preserving the local speech recognition and AI processing architecture developed throughout the project.

KINGER is a locally operated physical AI assistant capable of listening to voice commands through its I2S microphone array, executing local actions through commands.py, generating contextual AI responses through Ollama and Qwen 2.5 3B when no direct command applies, and speaking the result through Piper and the MAX98357A amplifier. A WiFi/MQTT communication layer connects the Raspberry Pi to the XIAO ESP32-C6, which provides visual feedback through a 16-pixel NeoPixel ring and reads physical button input for volume and mode control. All of these subsystems — electronics, software, and the 3D printed mechanical enclosure — are integrated into a single standalone desktop product.

Download files

For download 3D and others files, just click on the dancing KINGER.