Individual Assignment

For this week's assignment, I am planning to create a music player interface that allows users to play audio files stored on an SD card using an ESP32-S3 microcontroller. It sends commands to the ESP32-S3 over Wi-Fi. The ESP32-S3 processes the audio files and plays them through a speaker integrated into a custom-designed PCB.

The user interface will include song selection segment and buttons for play, pause, next, and previous track controls. I am also planning to create a jammer as in a DJ setup, where users can mix and manipulate the audio in real-time using a touch interface on a smartphone or tablet. The ESP32-S3 will handle the audio processing and output, while the user interface will provide an intuitive way for users to interact with their music collection.

The following is the skeleton for the user interface that I have designed in Figma:

Building the UI

First i asked claude to generate the HTML, CSS, and JavaScript code for the interface based on the website skeleton that i made.
Thi is the prompt that i used " I am currently trying to run or make a UI interface for the same, in which I will attach the skeleton of the website. All I need is for you to code. How it's done. I will also attach the reference pinterest link for it:Pinterest Reference. i want the disc to rotate like the one in this." and attached the image of the ui skeleton

I did not like the UI that claude generated, so i tried giving it different references from pinterest and different prompts, some iterations are :

Then i decided to design different elements in Figma and then uploaded it to claude:

After many more iterations i got something halfway decent.

code for the UI can be found in the final files section below.

System Architecture

The system has two main parts that communicate over WiFi:
• Browser UI — sends commands (PLAY, PAUSE, NEXT, PREV, TRACK:n, DIAL:angle)
• ESP32-S3 — receives commands via WebSocket, streams audio from SD card through I2S to speaker

Arduino IDE Board Setup>

• Board: XIAO_ESP32S3>
• Core version: esp32 by Espressif v3.3.7 or newer (required for I2S headers)
• PSRAM: OPI PSRAM — must be enabled or audio buffer allocation fails
• Upload speed: 921600

SD Card Setup

• Format: FAT32
• Folder structure: /audio/001.wav, /audio/002.wav etc.
• Audio format: 16-bit PCM, 44100 Hz, mono or stereo WAV

Now initially i had decided to run the framework in platformio, but repeated failures occurred during toolchain downloads in PlatformIO, i tried using the ESPSynth library and the ESP32 Board Core version needs 3.0.0 or newer version to run which was initially missed out. Now the fix required forcing Platform IO to use arduino-esp32 v3.x by adding platform_packages override, which meant a fresh toolchain download, several gigabytes in total. PlatformIO kept failing to download the toolchain, and even after multiple attempts, it was not successful.

After spending a lot of time trying to fix the issue,I decided to switch to Arduino IDE as it had an already installed ESP32 core 3.3.7 installed.

Now, after switching to Arduino another problem surfaced, this time it was the ESP32Synth.

ESP32Synth is a polyphonic synthesizer library for ESP32. Its primary purpose is real-time audio synthesis — generating waveforms (sine, sawtooth, pulse, triangle, noise), applying envelopes, wavetables, arpeggios, and effects entirely in software. It also has a streaming feature that can play WAV files from an SD card. For the PixelJam project, only the streaming feature was needed. This turned out to be where the library's limitations were most apparent.

The streaming system in ESP32Synth uses a ring buffer — a fixed-size memory area where the SD reader writes audio data and the audio task reads from it. The size of this buffer is set by a compile-time constant in ESP32Synth.h.

The fix was straightforward — switch to the low-RAM preset by commenting out the default block and uncommenting the low-RAM block. However this required editing the library source file directly, which is fragile — any library update would overwrite the change.

So i switched to another library which i had used in my output week, ESP32-audioI2S, IT is purpose-built for streaming audio files from SD card or the internet to an I2S DAC.

Challenges.docx has a detailed list of the challenges faced during the development process and how they were resolved. - put together by claude(prompt-"can you in detail explain the challenges faced while operating in platform IDE using the library ESP32Synth and then why we switched to Arduino and to another library I2SAudio")
After switching the library's it began running more efficiently.

Also remember to enable PSRAM in the board settings, otherwise the audio buffer allocation will fail and the music won't play.

Now the next problem i faced was that my ESP32-S3 was not connecting to my web application, i had done this through wifi, initially i though the wifi was weak so i switched to another one but it was still not getting connected.

Then i asked claude why this was happening and turns out i had to run the website over a local HTTP server or Chrome blocks WebSocket connections from file:// URLs for security reasons. Running the website on a local HTTP server allows the WebSocket connection to be established successfully.

After this the connection was successful and the website was running smoothly!

Seek and Scrub Effect

What i was trying to build was a vinyl turntable scrubbing effect. To rotate the disk forward and the song fast-forwards, rotate it backward and it rewinds, release and it plays from exactly that point. Simple in concept, surprisingly tricky in practice.

So i implemented the seek and scrub effect by sending DIAL:angle commands from the UI to the ESP32-S3 whenever the user interacts with the dial. The ESP32-S3 then calculates the corresponding position in the audio track based on the angle and seeks to that position in the audio file. This allows for real-time scrubbing through the track as the user rotates the dial, creating a vinyl turntable-like experience.

But turns out it was not as easy as it thought. Initial challenge was that the canvas dial tracks rotation as an angle between 0° and 360°. Every time it crossed the 360°/0° boundary it reset, so the ESP32 would receive DIAL:350 then suddenly DIAL:5.

To correct this i implemented track accumulated total rotation instead of raw angle. Every frame we calculate the change from the previous frame and add it to a running total. i.e 360+360+... , So spinning two full turns gives totalDeg = 720, not DIAL:360 reset to DIAL:0.

Also to find the song's length what we did is before playing each track we opened the file on SD card, read its file size and calculated the total duration using the mathematically. For a 16-bit mono 44100Hz WAV file, every second of audio is exactly 88200 bytes. So duration = (fileSize - 44) / 88200 — the 44 being the WAV header size. No callback needed, no parsing, always accurate.

Now, when it came to the actual scrub and seek implementation, everytime this was triggered, the music just continued from where it was initially paused from, completely ignoring the commands and no error was given as output.
So then i asked claude to simplify it to a direct 1:1 mapping where 1 degree = 1 second.
(prompt - The song pauses and all but its not rewinding when i scrub, i.e, the song is not moving according to the disk angle and does not give out outputs from where the disk is being rotated to whether it be forward or backward. let each rotating angle be 1s of the song and when you scrub back the esp shouldbe able to detect which timing of the song is to be played and then should start from there)

Hero Shot

Simple, intuitive, and easy to reason about.

Conclusion

My first realisation was that the most useful skill in this kind of project is knowing when to stop fighting something and switch to a different approach.

Staying in PlatformIO and sticking with ESP32Synth, both were the right tools on paper but the wrong tools for this specific situation at this specific moment. Letting go of them earlier would have saved hours.debugging embedded systems is less about fixing code and more about isolating which layer the problem lives in. Is it the hardware? The firmware? The library? The browser? The network? Once you know which layer, the fix is usually simple. Getting there is the hard part.

The scrub and seek feature, rotating the disk to move through a song like a real vinyl record was the most satisfying thing to get working. Not because it's technically complex, but because it required understanding the entire chain from finger movement on screen, to angle calculation in JavaScript, to WebSocket message, to file position calculation. Every single layer had to be right at the same time.

What i have done is not perfect. The background texture isn't quite right, the scrub could be smoother, and the track names are still hardcoded. But it plays music from a card through a speaker I soldered onto a board I designed, controlled by a website I built, communicating over WiFi in real time. For week 14, that feels like enough.

Final Files

pixeljam-arduino.zip
pixeljam-ui.zip

WEEK 14: Interface and Application Programming

Individual Assignment

Building the UI

Conclusion

Final Files