Week 17 — Wildcard week

Fab Academy’s Wildcard week asks for a digital process not already covered elsewhere—documented workflows, problems, fixes, and reproducible files. I used the free axis to upgrade the voice stack for my Forest Spirit / 森之精灵 plant companion: move from ASRPRO’s fixed offline phrase table (Week 15, Week 16) toward open-ended speech via the 小智 (XiaoZhi) module, which ships a ready-made ASR pipeline so I do not have to build recognizers from scratch. The bench work this week was mostly rewiring, burn-in isolation, and I²C power-domain hygiene before the new firmware could sit on the same ILI9341 + FT6336 display island I vibe-coded earlier.

Individual assignment

1) Task and motivation

The wildcard axis was mine to allocate, and the gap I cared about was voice—not another fabrication process. On the ASRPRO path I could wake 灵葭 and hit maybe twenty Mandarin command slots, but I could not say arbitrary sentences and expect text back. That ceiling showed up whenever I wanted the plant companion to feel conversational instead of menu-driven.

I looked at the full chain I actually need for the final project: microphone → ASR → language model → TFT UI. Building cloud-grade ASR on an ESP32 hub was never realistic; even tuning a vendor offline table in Tianwen already ate weeks. When I remembered 小智—a module that bundles wake, streaming recognition, and host hooks—I decided to shift the ASR role onto XiaoZhi and keep my existing XIAO ESP32‑S3 hub + ESP32‑WROOM display stack as the integration spine. Recognition should get faster, and I stop pretending the ESP side is a speech lab.

2) Target architecture (voice → ASR → LLM → screen)

The diagram below is the workflow I am migrating toward. It extends the three-board sketch from Week 16 but swaps the ASRPRO mailbox for XiaoZhi’s ASR output and keeps the LLM hop on the S3 when WiFi is up (today: DeepSeek over HTTPS, same as final/s3-hub/).

  ┌─────────────┐     audio / events      ┌──────────────────┐
  │  Microphone │ ───────────────────────▶│ 小智 XiaoZhi ASR │
  │  (on module)│                         │  wake + streaming │
  └─────────────┘                         │  text / intents   │
                                          └────────┬─────────┘
                                                   │ UART / I²C / WiFi
                                                   ▼
                                          ┌──────────────────┐
                                          │ XIAO ESP32‑S3    │
                                          │ integration hub  │
                                          │ sensors + WiFi   │
                                          └────────┬─────────┘
                                                   │ HTTPS (when online)
                                                   ▼
                                          ┌──────────────────┐
                                          │ Cloud LLM        │
                                          │ (DeepSeek API)   │
                                          └────────┬─────────┘
                                                   │ assistant text / UI cmds
                                                   ▼
                                          ┌──────────────────┐
                                          │ ESP32‑WROOM      │
                                          │ ILI9341 + FT6336 │
                                          │ five-page UI     │
                                          └──────────────────┘
            
Wildcard voice upgrade — data path: XiaoZhi owns ASR (replacing ASRPRO’s fixed snid table); the S3 hub owns dialogue + sensor fusion; the WROOM module still owns 320×240 TFT rendering over the 0x55 I²C TLV link documented in DATA_FLOW.md.

Why this is “wildcard” rather than Week 15 interface work: Week 15 proved TFT + touch + a closed offline voice table. This week applies a different digital computation process—vendor ASR module integration and LLM routing—for open speech, which the earlier assignment deliberately did not cover.

What I refreshed while planning: ASRPRO publishes three-byte mailbox events ( flag + snid); XiaoZhi instead exposes streaming text or intent callbacks that the S3 must parse. The display side still expects chunked UTF‑8 over I²C, so the hub remains the protocol translator between “whatever XiaoZhi emits” and “whatever the WROOM UI can paint.”

3) Plan

  1. Power down and rewire the bench cleanly—shared GND, separate concerns for display power vs. I²C pull-up reference.
  2. Bring XiaoZhi online on the breadboard beside the existing S3 + WROOM stack; confirm UART or bus traffic before touching LLM calls.
  3. Regression-test the TFT with my older screen-test firmware and the new vibe-coded UI sketch to separate panel defects from bus wiring defects.
  4. Fix I²C pull-up topology if the scope or stripes say the bus reference is wrong.
  5. Re-run end-to-end: spoken phrase → XiaoZhi ASR → S3 → (optional) DeepSeek → TFT chat panel.

4) Build & debug diary

I started by powering off, disconnecting, and re-seating every jumper—the unglamorous reset you do when a stack has been moved between desks. Then I flashed the Cursor-assisted UI program I had been iterating for the plant companion. The panel lit, but vertical tree-like stripes crawled across the whole image. That did not look like a font bug; it looked like the display or its bus was unhappy.

Bench setup while re-debugging ILI9341 capacitive touch and display wiring after reconnection
Touch + display re-debug (2026‑05‑25): after the full power cycle I went back to FT6336 touch checks and display wiring—trying to learn whether the stripes were a software regression or something physical on the breadboard.

To narrow the blame I reflashed the older screen-test module from Week 15—the one that only exercises ILI9341 patterns and touch readbacks. The stripes still appeared, which told me the new vibe-coded UI was probably not the root cause. My working hypothesis shifted to panel persistence / burn-in: this IPS module had sat on bright fields for long demos. I turned off the harsh white backlight for a while and let the panel rest; the ghosting faded, which matched a residual image story more than a fresh SPI bug.

Stripes with the legacy screen-test firmware: same artifact after reflashing the known-good test sketch—enough evidence to suspect panel persistence rather than a one-off UI draw bug. Letting the backlight stay off cleared most of the ghost image.

Burn-in mitigation helped, but when I returned to the XiaoZhi firmware path the bench still felt fragile—touch misses, odd refreshes, and that uneasy sense that the bus was marginal. At that point I stopped trusting “maybe software” and metered the wiring.

TFT showing vertical line artifacts; caption notes I2C communication suspected and wiring was adjusted
Vertical lines after another flash pass: I initially read this as pure comms noise. The pattern pushed me to re-route I²C and compare against a clean display test once the panel rested.

The mistake was subtle and embarrassing once I saw it: I had tied 3.3 V to the display module supply and then reused that same rail as the I²C pull-up reference for SDA/SCL. On a quiet breadboard that can look fine in still photos; under load the display’s current spikes droop the pull-up reference, which distorts I²C highs, which in turn confuses both FT6336 touch reads and whatever voice-side traffic shares the harness. Recognition glitches and TFT garbage can then show up together even though the “software” never changed.

Updated breadboard wiring diagram: separate 3.3 V domains for display power and I2C pull-ups
Corrected wiring (2026‑05‑26): display VCC and I²C pull-ups no longer share one abused 3.3 V tap—each has a disciplined path to the regulator, common GND, and discrete pull-ups on SDA/SCL. After this change the stripes and touch oddities stopped reproducing on demand.
Breadboard with XiaoZhi voice module connected alongside ESP32 display and hub boards
XiaoZhi on the bench (2026‑05‑25): module wired beside the S3 hub and WROOM display stack—the starting point for migrating ASR off ASRPRO. Stable I²C reference was a prerequisite before trusting any streaming recognition logs.

What changed in practice: once the pull-up reference was clean, the WROOM UI sketch and the XiaoZhi bring-up could coexist on the same afternoon without the TFT turning into a barcode. I still treat long static white screens as a panel health issue—rest the backlight between demos—but the recurring vertical comb pattern traced to power-domain wiring, not mysterious firmware.

5) Evidence & reproducibility

Per the wildcard checklist I am keeping photos, the stripe video, and the prior firmware tree in-repo so the story is auditable:

XiaoZhi-specific firmware is still landing on the hub side; the architectural bet is documented here so I can diff behavior against the ASRPRO mailbox without losing the wiring lessons.

6) Conclusion

Wildcard week, for me, was less about learning a new machine and more about admitting where the Forest Spirit stack had to evolve: open speech needs a real ASR module, and XiaoZhi is the shortcut I chose instead of hand-rolling recognizers on the ESP32. The surprise work was entirely analog-adjacent—display persistence fooled me once, but the durable fix was separating display supply from I²C pull-up reference so voice, touch, and TFT updates stop fighting the same droopy 3.3 V node.

Next hooks: finalize the XiaoZhi → S3 parser, map streaming text into the existing 0x55 chat TLV path, and retire the ASRPRO snid table once parity tests pass. The flow diagram above stays the acceptance target—when I can say a sentence that is not in a fixed menu and still see a sensible LLM reply on the TFT, this wildcard axis closes back into final-project packaging.