Week 17 — Wildcard week
Fab Academy’s Wildcard week asks for a digital process not already covered elsewhere—documented workflows, problems, fixes, and reproducible files. I used the free axis to upgrade the voice stack for my Forest Spirit / 森之精灵 plant companion: move from ASRPRO’s fixed offline phrase table (Week 15, Week 16) toward open-ended speech via the 小智 (XiaoZhi) module, which ships a ready-made ASR pipeline so I do not have to build recognizers from scratch. The bench work this week was mostly rewiring, burn-in isolation, and I²C power-domain hygiene before the new firmware could sit on the same ILI9341 + FT6336 display island I vibe-coded earlier.
Individual assignment
1) Task and motivation
The wildcard axis was mine to allocate, and the gap I cared about was voice—not another fabrication process. On the ASRPRO path I could wake 灵葭 and hit maybe twenty Mandarin command slots, but I could not say arbitrary sentences and expect text back. That ceiling showed up whenever I wanted the plant companion to feel conversational instead of menu-driven.
I looked at the full chain I actually need for the final project: microphone → ASR → language model → TFT UI. Building cloud-grade ASR on an ESP32 hub was never realistic; even tuning a vendor offline table in Tianwen already ate weeks. When I remembered 小智—a module that bundles wake, streaming recognition, and host hooks—I decided to shift the ASR role onto XiaoZhi and keep my existing XIAO ESP32‑S3 hub + ESP32‑WROOM display stack as the integration spine. Recognition should get faster, and I stop pretending the ESP side is a speech lab.
2) Target architecture (voice → ASR → LLM → screen)
The diagram below is the workflow I am migrating toward. It extends the three-board sketch from
Week 16 but swaps the ASRPRO mailbox for XiaoZhi’s ASR output and keeps
the LLM hop on the S3 when WiFi is up (today: DeepSeek over HTTPS, same as
final/s3-hub/).
┌─────────────┐ audio / events ┌──────────────────┐
│ Microphone │ ───────────────────────▶│ 小智 XiaoZhi ASR │
│ (on module)│ │ wake + streaming │
└─────────────┘ │ text / intents │
└────────┬─────────┘
│ UART / I²C / WiFi
▼
┌──────────────────┐
│ XIAO ESP32‑S3 │
│ integration hub │
│ sensors + WiFi │
└────────┬─────────┘
│ HTTPS (when online)
▼
┌──────────────────┐
│ Cloud LLM │
│ (DeepSeek API) │
└────────┬─────────┘
│ assistant text / UI cmds
▼
┌──────────────────┐
│ ESP32‑WROOM │
│ ILI9341 + FT6336 │
│ five-page UI │
└──────────────────┘
snid table); the S3 hub owns dialogue + sensor fusion; the WROOM module
still owns 320×240 TFT rendering over the 0x55 I²C TLV link documented in
DATA_FLOW.md.
Why this is “wildcard” rather than Week 15 interface work: Week 15 proved TFT + touch + a closed offline voice table. This week applies a different digital computation process—vendor ASR module integration and LLM routing—for open speech, which the earlier assignment deliberately did not cover.
What I refreshed while planning: ASRPRO publishes three-byte mailbox events (
flag + snid); XiaoZhi instead exposes streaming text or intent callbacks that the S3 must parse.
The display side still expects chunked UTF‑8 over I²C, so the hub remains the protocol translator
between “whatever XiaoZhi emits” and “whatever the WROOM UI can paint.”
3) Plan
- Power down and rewire the bench cleanly—shared GND, separate concerns for display power vs. I²C pull-up reference.
- Bring XiaoZhi online on the breadboard beside the existing S3 + WROOM stack; confirm UART or bus traffic before touching LLM calls.
- Regression-test the TFT with my older screen-test firmware and the new vibe-coded UI sketch to separate panel defects from bus wiring defects.
- Fix I²C pull-up topology if the scope or stripes say the bus reference is wrong.
- Re-run end-to-end: spoken phrase → XiaoZhi ASR → S3 → (optional) DeepSeek → TFT chat panel.
4) Build & debug diary
I started by powering off, disconnecting, and re-seating every jumper—the unglamorous reset you do when a stack has been moved between desks. Then I flashed the Cursor-assisted UI program I had been iterating for the plant companion. The panel lit, but vertical tree-like stripes crawled across the whole image. That did not look like a font bug; it looked like the display or its bus was unhappy.
To narrow the blame I reflashed the older screen-test module from Week 15—the one that only exercises ILI9341 patterns and touch readbacks. The stripes still appeared, which told me the new vibe-coded UI was probably not the root cause. My working hypothesis shifted to panel persistence / burn-in: this IPS module had sat on bright fields for long demos. I turned off the harsh white backlight for a while and let the panel rest; the ghosting faded, which matched a residual image story more than a fresh SPI bug.
Burn-in mitigation helped, but when I returned to the XiaoZhi firmware path the bench still felt fragile—touch misses, odd refreshes, and that uneasy sense that the bus was marginal. At that point I stopped trusting “maybe software” and metered the wiring.
The mistake was subtle and embarrassing once I saw it: I had tied 3.3 V to the display module supply and then reused that same rail as the I²C pull-up reference for SDA/SCL. On a quiet breadboard that can look fine in still photos; under load the display’s current spikes droop the pull-up reference, which distorts I²C highs, which in turn confuses both FT6336 touch reads and whatever voice-side traffic shares the harness. Recognition glitches and TFT garbage can then show up together even though the “software” never changed.
What changed in practice: once the pull-up reference was clean, the WROOM UI sketch and the XiaoZhi bring-up could coexist on the same afternoon without the TFT turning into a barcode. I still treat long static white screens as a panel health issue—rest the backlight between demos—but the recurring vertical comb pattern traced to power-domain wiring, not mysterious firmware.
5) Evidence & reproducibility
Per the wildcard checklist I am keeping photos, the stripe video, and the prior firmware tree in-repo so the story is auditable:
-
Baseline three-board integration (ASRPRO era):
code/week15-individual/final/ -
I²C + voice mailbox notes:
DATA_FLOW.md, Week 16 ASRPRO subsection - Display + touch bring-up references: Week 15 ILI9341 / FT6336 sections
-
This week’s media:
images/week17-individual/(five files: four photos + one MOV)
XiaoZhi-specific firmware is still landing on the hub side; the architectural bet is documented here so I can diff behavior against the ASRPRO mailbox without losing the wiring lessons.
6) Conclusion
Wildcard week, for me, was less about learning a new machine and more about admitting where the Forest Spirit stack had to evolve: open speech needs a real ASR module, and XiaoZhi is the shortcut I chose instead of hand-rolling recognizers on the ESP32. The surprise work was entirely analog-adjacent—display persistence fooled me once, but the durable fix was separating display supply from I²C pull-up reference so voice, touch, and TFT updates stop fighting the same droopy 3.3 V node.
Next hooks: finalize the XiaoZhi → S3 parser, map streaming text into the existing
0x55 chat TLV path, and retire the ASRPRO snid table once parity tests pass.
The flow diagram above stays the acceptance target—when I can say a sentence that is not in a fixed menu and
still see a sensible LLM reply on the TFT, this wildcard axis closes back into
final-project packaging.