Week 04: Embedded Programming

This week was our first real interaction with electronics. Honestly, it was much more fun than I thought it would be. I have been programming things for a while and have dabbled with the Arduino Uno before, but this week I got to learn A LOT more about electronics in general, and how we interact with them through code.

I will be structuring this week's documentation by explaining some theoretical aspects of embedded architectures (including our group assignment) and then showcase multiple projects I've built on different architectures.

Fun Note

You can see I had A LOT to learn this week from the length of this documentation. Feel free to navigate it through the nav bar on your left :)

Jump to this week's checklist

Group Assignment: Embedded Architectures & Toolchains Comparison

I will start off this section with a quote by Neil:

"Asking which processor to use is like asking who you should date."

Why is that, though?

To answer this, I need to explain what embedded architectures are from the very beginning.

Note

This section will explore different elements mentioned throughout our global session that I wanted to get a deeper insight into. Here's the outline that Neil follows during our sessions.

Tip

You can directly skip to my choice of hardware for my final project (for now), by pressing here.

The group assignment this week was to compare the different architectures and toolchains available to us in the lab, putting seven boards side by side and comparing them.

Before I get into our comparison (here's a link to the assignment), I want to use this section to build the foundation: what are these architectures, what do the terms actually mean, and why do the differences matter?

Note

That context is what makes the board-by-board comparison in the group assignment legible rather than just a table of numbers.

What Even Is an Embedded System?

Let's start with the most fundamental question. Your laptop runs Chrome, Fusion360, a code editor, and fifteen browser tabs at once. It's a general-purpose computer, and it doesn't care what you throw at it, it just runs it. An embedded system is the opposite of that.

Image generated through ChatGPT based on this section's text

It is a computing system designed to do one specific thing, and to do it well, reliably, and usually in real-time.

The thermostat on your wall is an embedded system. So is the controller inside your washing machine, the chip in your car's ABS brakes, and the sensor in a hospital's IV drip monitor.

None of them are running a browser. All of them are doing exactly one job, and they are expected to do it without crashing, without lag, and often without anyone paying attention to them at all.

This is the world we are stepping into with embedded programming.

Real-Time vs Asynchronous

One of the defining characteristics of most embedded systems is that they operate in real-time. That phrase sounds vague but has a precise meaning: the system must respond to events within a guaranteed time window.

If an ABS sensor detects wheel lock-up, the response cannot arrive 200ms late. That latency could mean an accident. The system must act now, within a hard deadline.

This is fundamentally different from the asynchronous world of web servers or desktop apps, where a 200ms delay is barely noticeable. In embedded systems, timing is correctness.

The Architecture Choices: Building Up From Nothing

To understand why Neil's dating analogy is so accurate, you need to understand that choosing a processor is really a cascade of smaller choices, each one narrowing the field. Let me walk through them.

Von Neumann vs. Harvard

Image generated through ChatGPT based on this section's text

At the most fundamental level, there are two ways to organize memory inside a computer.

Von Neumann architecture uses a single shared memory space for both instructions (your program) and data (your variables). The processor fetches from the same bus for both. Simple, elegant, but there's a bottleneck. You can't fetch the next instruction while reading data at the same time.

This is known as the Von Neumann bottleneck, and it's a real performance ceiling at the kind of tight loop speeds embedded systems need.

Harvard architecture separates them. Instructions live in one memory space, data in another, and they have independent buses. The processor can fetch the next instruction while simultaneously reading data from the previous one. It's more complex to design, but faster in practice for the kind of repetitive, deterministic tasks that embedded systems run.

Every chip we compared in the group assignment (AVR, ARM, Xtensa, RISC-V) uses Harvard architecture.

CISC vs. RISC

The next fork in the road is about the instruction set: the vocabulary of operations the processor can execute.

CISC (Complex Instruction Set Computing) chips like x86 (the architecture inside your laptop) have hundreds of instructions, many of them very powerful. A single instruction might do what would take ten simpler instructions elsewhere. The chip is more complex to build and more power-hungry, but programs can be more compact. The chip in your MacBook is CISC.

RISC (Reduced Instruction Set Computing) chips deliberately limit themselves to a small set of simple, fast instructions. Each one executes quickly and predictably, usually in a single clock cycle. The programs might be longer, but execution is faster, more predictable, and the chip consumes significantly less power.

For embedded systems, especially battery-powered or cost-sensitive ones, RISC wins on almost every axis. Lower power, faster execution of simple operations, easier to reason about timing, and simpler silicon that costs less to manufacture. Every architecture we studied this week is RISC.

Microprocessor vs. Microcontroller

A microprocessor is just the processing core. It needs external chips for memory, for I/O, for everything. Your laptop's CPU is a microprocessor; the RAM is a separate stick, the storage is a separate drive, the GPU is a separate chip. Each communicates over a bus.

A microcontroller (MCU) integrates the processor, memory (both flash for your program and SRAM for runtime data), and peripherals (timers, ADCs, communication interfaces, GPIO) all onto a single chip. Everything you need, in one package.

Image generated through ChatGPT based on this section's text

This integration is what makes microcontrollers so powerful for embedded work: lower cost, dramatically smaller footprint, lower power consumption, and no complex PCB routing between a CPU and external memory chips.

The whole system (the thing that would have filled a room in the 1960s) fits on a chip smaller than your fingernail. Crazy, is it not?!

So the path we follow: Harvard → RISC → Microcontroller since we are optimizing for the constraints of embedded systems: efficiency, real-time performance, low cost, and small size.

Every board in the group assignment lives at this intersection.

The Memory Hierarchy

Image generated through ChatGPT based on this section's text

Once you're inside a microcontroller, there are several types of memory, and each has a distinct purpose and a distinct contract with the programmer:

Registers: Tiny, ultra-fast storage built directly into the processor core. This is where actual arithmetic happens. Think of them as the processor's scratch pad. You load a value from memory into a register, operate on it, write it back. There are usually only a handful (8, 16, or 32 of them), each holding one word of data.

SRAM (Static RAM): Fast working memory. This is where your variables live at runtime, where the call stack grows and shrinks, where temporary data lives. It's fast and readable/writable at will, but it vanishes the moment power is cut. The XIAO RP2040 has 264 KB of it; the ESP32-S3 has 512 KB on-chip plus 8 MB PSRAM, a difference that turns out to matter enormously for ML applications.

DRAM (Dynamic RAM): Slower and denser than SRAM. Used in bigger systems like the Raspberry Pi (which runs Linux and needs gigabytes) but rarely found in small MCUs because it requires periodic refresh cycles and external controller logic.

Flash: Non-volatile memory where your program code lives. When you upload code to a microcontroller, it gets written to Flash. It persists through power cycles, but writing to it is slow and it has a finite write cycle count (usually 10,000–100,000 writes). The XIAO boards we used ranged from 2 MB to 8 MB of Flash.

EEPROM: Electrically Erasable Programmable Read-Only Memory. Also non-volatile, but byte-addressable. That means the program itself can write small amounts of persistent data at runtime. Useful for storing calibration values or configuration that needs to survive a reset.

Fuse bits: Special configuration bits that set fundamental chip behavior: clock source, bootloader enable, brown-out detection threshold. They are written by a programmer, not by your running program, and they are easy to set incorrectly in ways that temporarily brick a chip. AVR chips are notorious for this.

Peripherals (Chips Within the Chip)

The processor is almost the least interesting part of a microcontroller. The interesting parts (the parts that take up most of the datasheet, sometimes 1000+ pages) are the peripherals.

Image generated through ChatGPT based on this section's text

They are specialized hardware modules integrated onto the same die, each doing one job very efficiently without stealing CPU time:

GPIO (General Purpose Input/Output): The pins you directly control in code. Set high (3.3V or 5V) or low (0V), or read as input. The most basic interaction with the physical world.

ADC (Analog-to-Digital Converter): Reads an analog voltage and converts it to a digital number. A 12-bit ADC maps 0–3.3V to values 0–4095. This is how you read a microphone, a light sensor, a potentiometer, a battery voltage.

DAC (Digital-to-Analog Converter): Takes a number and outputs a corresponding voltage, the opposite of ADC. Useful for audio output or setting analog reference levels.

Timers/Counters: Hardware modules that count clock cycles entirely independently of the CPU. Used for precise timing, measuring pulse widths, generating PWM signals, and triggering interrupts at exact intervals without any CPU polling.

PWM (Pulse Width Modulation): A technique of rapidly switching a digital pin on and off to simulate an analog output level. Control the duty cycle and you control LED brightness, motor speed, servo angle. Fun fact: all 11 GPIO pins on the XIAO are PWM-capable.

USART/UART: Universal Asynchronous Receiver-Transmitter. The old reliable serial communication workhorse. Two wires (TX, RX), configurable baud rate. Still used everywhere.

I2C: Two wires (SDA data, SCL clock), multiple devices sharing the same bus, each identified by a 7-bit address. Slower than SPI, but extremely convenient for sensors and displays where pin count matters.

SPI: Four wires (MOSI, MISO, SCK, CS), one chip select per device, much faster than I2C. Used for displays, SD cards, and high-speed sensors where throughput matters.

USB: Some MCUs implement USB entirely in hardware, allowing them to enumerate as a serial port, HID keyboard/mouse, MIDI device, or mass storage (all without an additional USB-to-serial chip). The SAMD21 and RP2040 do this; the ATmega328P on the Uno R3 needs a dedicated ATmega16U2 chip to do it.

As Neil put it:

"You can't understand the software until you understand the hardware. That's why you read the datasheet."

Every peripheral has its own chapter. Every register has a name and a function. Every bit has a consequence. You don't necessarily need to read the full datasheet, but you need to learn how to skim through it and navigate it.

Processor Families

Each family represents a different set of decisions made by its designers. Each has different trade-offs between performance, power, cost, complexity, and ecosystem.

Processor family comparison diagram

AVR: Created by two Norwegian students at the Norwegian Institute of Technology in 1996, AVR is the architecture that powers the original Arduino. 8-bit, 5V, simple, and extraordinarily well-documented. The ATmega328P datasheet is 650 pages and covers every register in exhaustive detail. Some chips in this family are literally the size of a grain of rice. Slower than modern 32-bit chips by every metric, but there is nowhere to hide: when something goes wrong, the simplicity of the architecture makes it debuggable. That's underrated.

ARM Cortex-M: The dominant family in modern embedded work. ARM Holdings designs the core and licenses it to manufacturers (Microchip, STMicroelectronics, Renesas, Nordic, Raspberry Pi) so the same instruction set runs on very different silicon at very different price points. The Cortex-M0+ (used in the RP2040 and SAMD21) is minimal and low-power. The Cortex-M4 (used in the Arduino Uno R4's Renesas RA4M1) adds a hardware floating-point unit for DSP workloads. 32-bit, runs at 48–500 MHz depending on variant and manufacturer.

Xtensa LX7: A proprietary 32-bit RISC architecture from Cadence, used exclusively in Espressif's ESP32 chips. Designed with signal processing in mind: wider SIMD instructions, fast floating-point, hardware acceleration for common DSP operations. The ESP32-S3 runs two LX7 cores at up to 240 MHz each — and they can be assigned different tasks independently.

RISC-V: An open-source instruction set architecture. No company owns it; no royalties are paid. The ESP32-C6 uses two RISC-V cores (one high-power at 160 MHz, one ultra-low-power at 20 MHz for background tasks). The ecosystem is less mature than ARM, but growing fast.

Go to the Group Assignment

The group assignment takes all of this and puts it into practice. We compared seven boards (some of the available at our lab), two toolchains, example code for Arduino, with actual comparison data on everything from setup friction to debugging capability.

→ Read the full Group Assignment: Embedded Architectures & Toolchains Comparison

Learnings From the Global Session

Well, the section above was pretty information heavy, but think of this less as notes and more as the honest residue of three hours with Neil Gershenfeld moving fast through a lot of ground.

For the full session outline and reference materials, here's the link.

Understanding Toolchains

Neil's insisted that the goal is not to program a microcontroller. It's to understand the toolchain. Those sound similar but are different things.

Programming a microcontroller means pressing Upload in the Arduino IDE and watching the LED blink. Understanding the toolchain means knowing that pressing Upload invokes avr-gcc (or xtensa-esp32s3-elf-gcc, or arm-none-eabi-gcc depending on your chip), which compiles your source into machine code, which the linker combines with the Arduino core libraries into an ELF binary, which avrdude (or esptool.py, or openocd) then flashes over the appropriate protocol (USB-serial, UPDI, JTAG) into the chip's flash memory.

The IDE hides all of this. It's still happening.

Why does this matter? Because the moment something breaks (trust me, it will) you need to know where in that chain to look. A chip that won't enumerate on USB is a different problem from a chip that flashes but doesn't run, which is a different problem from code that compiles but produces wrong behavior at runtime. Each failure mode lives at a different point in the chain. Knowing the chain is how you debug rather than guess.

Arduino is Six Things at Once

Arduino is not one thing. It is six things that happen to share a name and that often get conflated:

Boards: Physical hardware you can buy (Uno, Nano, MKR, etc.) or design yourself.
Toolchain: GCC compiler, avrdude programmer, linker, libraries. Arduino brings these in automatically.
Libraries: Some excellent, some terrible. Wire.h for I2C is fine. Some community libraries are memory-inefficient abstractions over abstractions.
IDE: Functional for quick tests. Not great for large projects. Most serious embedded developers move to PlatformIO or direct command-line toolchains.
Bootloader: The small program that lives in the chip's upper flash and listens for incoming code uploads. Costs 512 bytes of flash.
Cores: The hardware abstraction layer that tells the Arduino framework how to talk to a specific chip. Without the right core, digitalWrite doesn't know which registers to write.

Deciding the Processor Family

Each processor family exists for reasons that are as much political and economic as technical.

AVR won through approachability, not performance. Arduino formed around it and made being easy to learn a self-reinforcing advantage. ARM is a licensing story. RISC-V is the reaction to that: open-source, no royalties, no single owner. Xtensa is Espressif doing what ARM did, but for themselves.

Breadboards

Neil hates breadboards, not aesthetically, but because intermittent connections are indistinguishable from software bugs. You can debug for hours and the code is fine; the wire is just barely touching the rail.

For this week though, breadboards are fine. I used one as a display prop (explanation in the Ramadan Kareem Project below).

Programming Languages

C is dangerous not because it's hard, but because it trusts you (buffer overflows, null pointer dereferences, use-after-free) — these are the leading cause of security vulnerabilities in deployed firmware, and C just lets them happen. It's still dominant because nothing matched it for performance until recently.

Rust is changing that. no_std Rust runs on bare metal ARM, RISC-V and increasingly Xtensa, and the borrow checker is actually the point in embedded, where there's no stack trace and failures are silent.

MicroPython / CircuitPython are more useful than I expected. For I/O-heavy work (sensors, displays, WiFi) where raw speed isn't the bottleneck, Python's iteration speed could be worth the performance tradeoff.

Simulation

Two tools worth knowing: Wokwi is a browser-based circuit + code simulator covering Arduino, ESP32, STM32 and others, good for testing logic before touching hardware, though it won't warn you about missing current-limiting resistors.

AVR8js is more faithful. It simulates actual circuit behavior, not just logic, so it catches real electrical issues too.

Also, over the weekend, I discovered SCHEMATIK, "Cursor for Hardware."

RTOSes and Multitasking

I hadn't thought about embedded systems running operating systems before this. Not Linux, but lightweight RTOSes like FreeRTOS, which handles task scheduling, priorities, and synchronization.

The ESP32-S3 actually runs FreeRTOS under the hood even in Arduino mode. The dual-core architecture means you can pin tasks to specific cores.

Why We're Using the XIAO ESP32-S3 for the Final Project

Everything above was background I needed before I could answer one specific question clearly: what chip is right for the final project?

The project is AR glasses with an integrated EEG-based brain-computer interface, processed through neuromorphic computing/-like systems. The idea is to move beyond the two dominant failure modes of on-device AI: always-on inference that drains the battery in hours, and cloud-dependent processing that introduces latency and breaks without connectivity, while combining two emergent technologies AR and EEG.

Our system needs to:

Acquire and process EEG signals continuously in real-time
Run neural inference on-device, with no cloud dependency and no latency budget
Drive an optical output for the AR display layer
Capture visual input via an onboard camera for scene context and AR/AI assistant
Maintain BLE connectivity for configuration and data logging
Run on battery for hours, in a form factor that fits inside a glasses frame
Survive being worn on a human head

Now map those requirements against everything we've covered:

Real-time EEG acquisition → needs a fast core with hardware floating-point and enough ADC resolution/speed
On-device ML inference → needs enough RAM to hold a model. This is the hard constraint that eliminates most MCUs immediately.
Camera input → needs a dedicated camera interface
BLE → needs integrated radio, not a separate module that adds size and power draw
Battery-powered option and small in size

So far, every one of those constraints pointed to the same chip: the XIAO ESP32-S3 Sense.

XIAO ESP32-S3 Sense overview from datasheet

Overview (link to documentation, link to datasheet)

Dual Xtensa LX7 cores at 240 MHz: The ESP32-S3 runs FreeRTOS with full dual-core task pinning. In practice for this project: EEG signal acquisition and preprocessing pinned to Core 1, inference and BLE stack on Core 0 (parallel, non-blocking, no juggling).

8 MB PSRAM: This is the decisive factor. Running TinyML inference requires holding the model weights in RAM alongside the activations from each layer. The RP2040 has 264 KB total SRAM. The ESP32-C6 has 512 KB. The ESP32-S3's 8 MB PSRAM changes the category entirely: EEG classification models, attention detection networks, spike sorting algorithms — all fit.

Integrated camera interface (DVP): The Sense variant includes a board-to-board connector carrying the DVP (Digital Video Port) camera interface to an OV2640 camera module. 2MP, configurable resolution, JPEG compression in hardware. The camera interface is native to the chip.

Integrated digital microphone: Also on the Sense variant. I2S interface, 360° pickup, noise-cancelling design. Relevant for voice command input and audio-context awareness in the AR layer.

WiFi + BLE 5.0 integrated: No separate radio module. BLE 5.0 supports longer range and higher throughput than BLE 4.2, meaningful for streaming EEG data to a companion device for visualization. Relevant for Project 06 below.

8 MB Flash: Model weights can be stored in flash and loaded into PSRAM at inference time.

14 μA deep sleep current: When the glasses are idle, the chip can sleep at 14 microamps, helping with battery longevity.

21 × 17.5 mm form factor: Small enough to actually fit. The XIAO would fit into any custom PCB I'd create later.

The alternatives that came closest: the XIAO RP2040 has excellent PIO for custom peripheral timing and a beautiful drag-and-drop bootloader, but no wireless, no camera interface, and 264 KB SRAM makes inference infeasible. The ESP32-C6 has dual RISC-V cores and the best wireless protocol stack in the XIAO lineup (WiFi 6, Zigbee, Thread), but the Xtensa LX7 outperforms it in raw compute, and the 8 MB PSRAM is not available on the C6 variant.

What I Built This Week

Based on my past, very brief, experience with programming an Arduino Uno, I wanted to move past just "blinking a LED" and calling it a day.

I wanted to try and experiment with things that'd be later on relevant to my final project, first on the Arduino and then explore the XIAO ESP32-S3 with its Grove Shield (yes, I got to actually solder something this week), since it is my first time using it.

Thus, here's a list of the things I managed to finish:

Arduino Step Motor controlled through detecting blinks from EEG signals
Soldering the XIAO ESP32-S3 to a Grove Shield
Controlling an RGB light using a "rotary angle sensor" on the XIAO ESP32-S3
Creating a Ramadan Kareem animation on an OLED display triggered by a button
Streaming the feed from the camera on the XIAO ESP32-S3 to my computer
Connecting the MUSE EEG Headband: live EEG waveform, blink counter, and frequency bands, all on device

I was also experimenting with loading "MiniClaw," basically an AI assistant based on the viral OpenClaw open-source agent (recently acquired by OpenAI), to the ESP32-S3, but didn't get the chance to finish it yet.

To be honest, the hardest part of this week was to get the MUSE headband to reliably relay data to the ESP32-S3 and processing it on-device.

I used PlatformIO on VSCode for all projects, so I'll only explain the general process once. Here's a quick walkthrough from this week's group assignment:

PlatformIO is a professional embedded development platform that runs as an extension inside Visual Studio Code. Where Arduino IDE hides the toolchain, PlatformIO exposes it, and gives you control over it.

The key difference in practice is not the feature list but the debugging. Arduino IDE's only debugging mechanism is printing to the serial monitor. PlatformIO supports hardware debugging with breakpoints, variable inspection, and step-through execution on supported boards.

Prerequisites: VS Code, Python 3.5+. On Linux, also python3-venv.

Install VS Code: Download from code.visualstudio.com

Install PlatformIO IDE extension: Open VS Code → Extensions → search PlatformIO IDE → Install. PlatformIO appears as an ant icon in the sidebar. On first launch it downloads compilers and frameworks automatically.

PlatformIO IDE interface

Create a project: PlatformIO Home → New Project → select board → select framework (Arduino) → Finish.

Project structure:

project/
├── src/
│   └── main.cpp
├── lib/
├── include/
└── platformio.ini

Configure platformio.ini (example for XIAO ESP32S3):

[env:seeed_xiao_esp32s3]
platform = espressif32
board = seeed_xiao_esp32s3
framework = arduino
monitor_speed = 115200

Build: ✓ in the bottom toolbar | Upload: → in the bottom toolbar | Monitor: the plug icon

PlatformIO build/upload/monitor toolbar

All three are also available as pio command-line commands, which is useful for automation or scripting.

Project 01: Arduino Stepper Motor Controlled by EEG Blinks

The Muse 2 EEG headband streams brainwave data at 256 samples/second over Bluetooth. A Python script on the laptop picks up that stream via LSL, watches the frontal electrode readings for the sharp voltage spike an eye blink creates, and whenever one is detected it sends a BLINK\n command over USB serial to an Arduino Uno.

The Arduino then turns a 28BYJ-48 stepper motor a quarter revolution. A momentary push button wired externally to Arduino pin 2 (using INPUT_PULLUP) lets you pause and resume detection without touching the laptop.

Here is an early sneak-peak (later there is an important addition — this was the first thing I built this week, I was too excited about it):

Python File: `blink_to_stepper.py`

Imports and dependencies

from pylsl import StreamInlet, resolve_byprop
import serial, serial.tools.list_ports
import numpy as np
from collections import deque

pylsl connects to the Muse EEG stream broadcast by muselsl stream. pyserial handles the USB serial link to the Arduino. numpy provides the array math for the blink detection, and deque(maxlen=WINDOW_SIZE) gives a fixed-size sliding window that automatically discards old samples as new ones arrive.

Auto-detecting the Arduino port

for p in ports:
    desc = (p.description + " " + (p.manufacturer or "")).lower()
    if any(k in desc for k in ("arduino", "ch340", "cp210", "ftdi", "usbmodem", "usbserial")):
        port = p.device
        break

Rather than hardcoding /dev/cu.usbmodem..., the script scans all connected serial ports and matches against the USB-to-serial chip names that Arduino boards use. If nothing matches, it prints the available ports and asks the user to pick manually.

Config constants

SAMPLE_RATE        = 256     # Muse 2 EEG sample rate (Hz)
WINDOW_SIZE        = 50      # ~200 ms sliding window
BLINK_THRESHOLD    = 100     # spike amplitude in microvolts
MIN_BLINK_INTERVAL = 0.5     # seconds between accepted blinks

The 100 µV threshold sits well above background EEG noise (10–30 µV) but below the peak of a real blink (200–500 µV). The 0.5 s minimum interval prevents one long blink event from being counted twice.

The detection loop

for ch in [1, 2]:  # AF7, AF8 — frontal channels
    eeg = np.array(buffers[ch])
    # A blink creates a large, sharp amplitude swing.
    # We check both peak amplitude AND variance to avoid triggering
    # on slow baseline drift, which has high amplitude but low variance.
    if np.max(np.abs(eeg)) > BLINK_THRESHOLD and np.std(eeg) > 30:
        if now - last_blink_time > MIN_BLINK_INTERVAL:
            blink_detected = True
            break

Only AF7 and AF8 are checked. The frontal electrodes pick up eye-movement artifacts most strongly. The dual condition (large peak AND high standard deviation) is important: slow drift in the baseline can produce high absolute values but stays at low variance, while a genuine blink spike is sharp and creates high variance over the 200 ms window.

Motor wiring and constants

#define STEPS_PER_REV 2048   // 28BYJ-48 full revolution in half-step mode
#define ROTATION_STEPS 512   // quarter turn (90 degrees)

// Note: ULN2003 wiring interleaves coil pins -> pass as 8,10,9,11
Stepper motor(STEPS_PER_REV, 8, 10, 9, 11);

The 28BYJ-48 is a gear-reduced unipolar stepper. In half-step mode it takes 2048 steps per revolution (64 internal steps × 32:1 gear ratio = 2048). The pin order 8, 10, 9, 11 is intentional — it maps to the ULN2003 driver's IN1–IN4 inputs in the correct coil energisation sequence so the motor actually rotates instead of vibrating. Passing them in the wrong order is a common mistake.

Arduino + stepper motor + ULN2003 driver setup

Button debouncing

The debouncer works in two stages. First, any change is recorded with a timestamp. Second, after 40 ms of stable reading, the change is accepted. This prevents the mechanical bouncing of the button contacts (which can generate dozens of false edges in a few milliseconds) from being interpreted as multiple presses.

Setup with motor and Arduino

3D Printing a Part to Demonstrate Rotation

During our regional review, it was mentioned to me that the rotation of motor wasn't very visible, so I took the measurements of the motor, and designed a piece that would fit on it.

3D printed part design for motor shaft

3D printed part fitted on motor

And it worked!

Download Project 01 Source Code

Project 02: Soldering the XIAO ESP32-S3 to a Grove Shield

Since I wanted to try the XIAO ESP32-S3 this week, Andre, our instructor, said if you want to, you better get to soldering the pins and attach it to a Grove shield since it makes testing much easier.

The Grove Shield breaks out all the XIAO's pads into standard 4-pin Grove connectors so sensors plug in without breadboards or loose wires.

Process:

Align the XIAO on the shield's footprint so every castellated pad sits over the corresponding shield pad.
Tack two opposite corner pads first to lock the alignment.
Solder the remaining pads with a fine tip and minimal solder.

Soldered XIAO ESP32-S3 on Grove Shield

Project 03: RGB LED Controlled by a Rotary Angle Sensor (XIAO ESP32-S3)

Turning the knob cycles the Grove Chainable RGB LED through the full HSV color wheel at constant brightness. The LED uses the P9813 protocol, which we drive directly via bit-banging, no library needed.

RGB LED + rotary angle sensor setup

ADC averaging

int readRotaryRawAveraged() {
    long sum = 0;
    for (int i = 0; i < 8; ++i) sum += analogRead(PIN_ROTARY);
    return static_cast<int>(sum / 8);
}

The ESP32-S3's ADC has measurable noise — a single reading can jitter by ±20–30 counts even with the knob held still. Averaging 8 readings reduces that noise by a factor of √8 ≈ 2.8.

Low-pass filtering and hysteresis

// Exponential moving average: filtered = filtered*7/8 + raw*1/8
static int filtered = raw;
filtered = (filtered * 7 + raw) / 8;

// Only commit when change is large enough to be intentional
// 12 ADC counts out of 4095 is roughly 0.3 degrees of rotation
static int lastCommitted = filtered;
if (abs(filtered - lastCommitted) >= 12) {
    lastCommitted = filtered;
}

The exponential moving average (EMA) smooths out jitter while the knob is idle. The time constant is 8 loop iterations (8 × 15 ms = 120 ms). On top of that, hysteresis (12-count dead zone) prevents the colour from flickering between two adjacent hue values when the knob sits between them.

HSV to RGB conversion

HSV (Hue, Saturation, Value) is a perceptual color space. By holding saturation at 255 (fully saturated) and value at 210 (fixed brightness) and varying only hue 0–255, we get a smooth perceptual sweep around the color wheel without the color appearing to get brighter or dimmer as the knob turns.

Download Project 03 Source Code

Project 04: Ramadan Kareem Animation on OLED (XIAO ESP32-S3)

Button D2 triggers a 7-second animation: a crescent moon slides in from the left, stars twinkle, then "RAMADAN KAREEM" types itself letter by letter. The final card holds until the next button press.

(The breadboard was just for propping purposes)

Ramadan Kareem OLED animation setup

State machine

enum class SceneState : uint8_t { Idle, Animating, Finished };

Three states keep loop() simple. Idle = static crescent shown. Animating = 7-second timed sequence running. Finished = final card held.

Crescent moon drawing

void drawCrescent(int16_t cx, int16_t cy, int16_t radius) {
    display.fillCircle(cx, cy, radius, SSD1306_WHITE);         // full disc
    display.fillCircle(cx + radius/2, cy - radius/5, radius, SSD1306_BLACK);  // carve out
}

The crescent is a clever trick with the monochrome OLED: draw a white filled circle, then draw a black filled circle of the same radius but offset up and to the right. The black circle erases part of the white one, leaving only the crescent-shaped overlap.

Typewriter text reveal

void drawRevealedWord(int16_t x, int16_t y, const char* text, size_t visibleChars) {
    const size_t toShow = min(visibleChars, strlen(text));
    display.setCursor(x, y);
    for (size_t i = 0; i < toShow; ++i) display.print(text[i]);
}

The total reveal window is 2300 ms. The number of visible characters is proportional to elapsed time within that window. Text starts at t=2.2 s and is fully revealed by t=4.5 s.

Here's how the final result looks:

Download Project 04 Source Code

Project 05: Camera Feed Streaming to Computer (XIAO ESP32-S3)

The XIAO ESP32-S3 Sense captures live 640×480 JPEG frames from the OV2640 camera and streams them over USB serial at 3 Mbaud to a Python viewer on the laptop.

Camera module attached to XIAO ESP32-S3

Why 3 Mbaud?

At the default 115200 baud you get ~14 KB/s, barely one small frame per second. A VGA JPEG at quality 12 is roughly 15–40 KB. At 3 Mbaud (~375 KB/s usable) you can sustain 6 fps of VGA JPEG comfortably.

Camera initialization

config.pixel_format = PIXFORMAT_JPEG;     // hardware JPEG compression
config.frame_size   = FRAMESIZE_VGA;      // 640x480
config.jpeg_quality = 12;                 // 0=best, 63=worst
config.fb_count     = 1;
config.grab_mode    = CAMERA_GRAB_WHEN_EMPTY;

PIXFORMAT_JPEG tells the OV2640 to compress each frame to JPEG in its own hardware before DMAing the result to the ESP32-S3's SRAM. CAMERA_GRAB_WHEN_EMPTY means a new frame is grabbed only when the buffer is empty.

Frame header for sync

struct FrameHeader {
    char     magic[4];    // always "JPG0" — frame boundary marker
    uint32_t jpegLen;     // JPEG bytes to follow
    uint16_t width;
    uint16_t height;
};

The 12-byte header precedes every JPEG payload. The "JPG0" magic lets the Python receiver re-synchronize after any lost bytes.

Python viewer: framing recovery

def sync_to_magic(ser):
    window = bytearray()
    while True:
        b = ser.read(1)
        window += b
        if len(window) > 4: window.pop(0)
        if bytes(window) == b"JPG0":
            rest = read_exact(ser, HEADER_SIZE - 4)
            return b"JPG0" + rest

If the viewer starts mid-stream, or if USB drops a byte, it just keeps scanning until the magic bytes appear.

Camera streaming to Python viewer on laptop

Download Project 05 Source Code

Project 06: Muse EEG On-Device (XIAO-ESP32-S3): BLE Streaming, Blink Counter, Waveform of EEG

This is the most complex project of the week. The XIAO ESP32-S3 connects directly to the Muse 2 headband via Bluetooth LE, decodes raw EEG packets, detects blinks in real time, computes frequency-band power, and drives the OLED — no computer involved at all (only for power).

XIAO ESP32-S3 + OLED running Muse EEG visualization

Three UI modes, cycled with the D2 button:

Blink Counter: large digit count of blinks detected so far.
Calibrating: measures quiet-noise baseline for 2.5+ seconds, then sets the blink threshold.
Waveform: live scrolling AF7 and AF8 EEG traces with frequency-band percentages.

BLE UUIDs and connection parameters

constexpr const char* MUSE_SERVICE_UUID = "0000fe8d-0000-1000-8000-00805f9b34fb";
constexpr const char* MUSE_AF7_UUID     = "273e0004-4c4d-454d-96be-f03bac821358";
constexpr const char* MUSE_AF8_UUID     = "273e0005-4c4d-454d-96be-f03bac821358";

constexpr uint16_t MUSE_CONN_ITVL_MIN  = 24;   // 30 ms
constexpr uint16_t MUSE_CONN_ITVL_MAX  = 40;   // 50 ms

The Muse BLE service and characteristic UUIDs come from the open-source Muse SDK documentation. The connection interval parameters control the BLE radio duty cycle: 30–50 ms intervals balance throughput against power consumption.

ISR-safe ring buffer architecture

BLE notifications arrive in a FreeRTOS BLE task running on Core 0. That task cannot safely call the OLED driver, run the blink detector, or do floating-point DSP, because those aren't thread-safe and could block the BLE stack.

The solution is a classic producer-consumer ring buffer. The BLE notify callback is the producer — it enqueues the 20-byte raw packet under a portMUX critical section (an ISR-safe spinlock on the ESP32). The main loop is the consumer, draining the queue up to 40 packets per call.

BLE scanning and connecting

BLE scanning for Muse headband

class MuseAdvertisedDeviceCallbacks : public NimBLEAdvertisedDeviceCallbacks {
    void onResult(NimBLEAdvertisedDevice* advertisedDevice) override {
        if (advertisedDevice->getName().rfind("Muse", 0) == 0) {
            gTargetDevice = new NimBLEAdvertisedDevice(*advertisedDevice);
            NimBLEDevice::getScan()->stop();
        }
    }
};

Once found, the firmware discovers the Muse GATT service, subscribes to AF7 and AF8 notifications, then sends two commands to start the data stream: sendMusePreset21() (raw EEG mode) and writeMuseCmd("d") (start data).

12-bit packet decoding

Each 20-byte BLE notification from the Muse contains a 16-bit sequence number followed by 12 EEG samples, packed at 12 bits each (12 × 12 = 144 bits = 18 bytes). The 12-bit ADC has range 0–4095, centered at 2048 (representing 0 V). Subtracting 2048 and multiplying by 0.488 µV/count converts to microvolts.

High-pass filtering (baseline removal)

*baseline = (1.0f - HP_BASELINE_ALPHA) * (*baseline) + HP_BASELINE_ALPHA * uv;
const float hp = uv - *baseline;  // high-pass residual

HP_BASELINE_ALPHA = 0.005 gives a time constant of ~0.8 s. The baseline follows slow DC drift caused by electrode movement and skin potentials. Subtracting it leaves only the fast EEG signals and blink spikes — the signals we care about.

Adaptive blink detection

Muse headband connected, blink counter mode

// Adaptive threshold: rises with noise floor
const float threshold         = fmaxf(gBlinkThresholdUv, noiseMean * BLINK_NOISE_MULT);
const float secondaryThreshold = threshold * 0.60f;

// Two-channel confirmation: bilateral spike = genuine blink
const bool trigger = (absA7 >= threshold && absA8 >= secondaryThreshold) ||
                     (absA8 >= threshold && absA7 >= secondaryThreshold);

The adaptive threshold is the key insight: in a noisy environment the noise floor rises, and the threshold rises with it, preventing false positives without manual adjustment. Two-channel confirmation means the spike must appear on both AF7 and AF8 simultaneously.

Band power via windowed DFT

for (int k = 1; k <= 22; ++k) {  // DFT bins 1–22 cover 2–44 Hz
    float re = 0, im = 0;
    for (int n = 0; n < 128; ++n) {
        const float w   = 0.5f - 0.5f * cosf((2.0f * PI * n) / 127.0f);  // Hann window
        const float x   = gBandBuf[(gBandWrite + n) % 128] * w;
        const float ang = (2.0f * PI * k * n) / 128.0f;
        re += x * cosf(ang);
        im -= x * sinf(ang);
    }
    const float p = re*re + im*im;
    // bin to band (delta < 4 Hz, theta 4–8, alpha 8–13, beta 13–30, gamma 30+)
}

This is a direct DFT (not FFT) over 128 samples at 256 Hz — giving 2 Hz resolution per bin. The Hann window tapers the signal to zero at both ends, preventing spectral leakage. We only compute bins 1–22 (2–44 Hz) because that covers all 5 EEG bands. Bands are shown as percentages of total power on the waveform UI screen.

Waveform rendering

XIAO OLED showing live EEG waveform

The 128-sample ring buffer maps directly to the 128 horizontal pixels. Drawing drawLine between adjacent samples gives a continuous trace. Auto-scaling adapts slowly: scale = scale * 0.999 + |hp| * 6 * 0.001 — small when the signal is quiet, larger when blinks appear, keeping the trace from clipping.

Alpha, beta and gamma percentages are shown in the top-right corner. The OLED is split into two graph regions: AF7 occupies rows 12–36, AF8 occupies rows 38–62, separated by horizontal divider lines.

OLED displaying EEG band data

Download Project 06 Source Code

Source Code

Download the source files for each project below:

This Week's Checklist

Linked to the group assignment page
Browsed and documented some information from a microcontroller's datasheet
Programmed a board to interact and communicate
Described the programming process(es) you used
Included your source code
Included ‘hero shot(s)’

Week 04: Embedded Programming

Group Assignment: Embedded Architectures & Toolchains Comparison

What Even Is an Embedded System?

Real-Time vs Asynchronous

The Architecture Choices: Building Up From Nothing

Von Neumann vs. Harvard

CISC vs. RISC

Microprocessor vs. Microcontroller

The Memory Hierarchy

Peripherals (Chips Within the Chip)

Processor Families

Go to the Group Assignment

Learnings From the Global Session

Understanding Toolchains

Arduino is Six Things at Once

Deciding the Processor Family

Breadboards

Programming Languages

Simulation

RTOSes and Multitasking

Why We're Using the XIAO ESP32-S3 for the Final Project

What I Built This Week

Project 01: Arduino Stepper Motor Controlled by EEG Blinks

Python File: blink_to_stepper.py

Project 02: Soldering the XIAO ESP32-S3 to a Grove Shield

Project 03: RGB LED Controlled by a Rotary Angle Sensor (XIAO ESP32-S3)

Project 04: Ramadan Kareem Animation on OLED (XIAO ESP32-S3)

Project 05: Camera Feed Streaming to Computer (XIAO ESP32-S3)

Project 06: Muse EEG On-Device (XIAO-ESP32-S3): BLE Streaming, Blink Counter, Waveform of EEG

BLE UUIDs and connection parameters

ISR-safe ring buffer architecture

BLE scanning and connecting

12-bit packet decoding

High-pass filtering (baseline removal)

Adaptive blink detection

Band power via windowed DFT

Waveform rendering

Source Code

This Week's Checklist

Python File: `blink_to_stepper.py`