Skip to content

Week 11: Network and Communications

Image Capture with Camera over Wi-Fi

Hero image Image capture and communication I used my XIAO ESP32S3 Sense board and developed a system to capture images with the OV2640 camera sensor and serve them over a local Wi-Fi network. The final version streams live video to a browser and lets you freeze a frame by pressing a hardware button.


Assignments

Group Assignment

  • Send a message between two projects.

Individual Assignment

  • Design, build, and connect wired or wireless node(s) with network or bus addresses and local input and/or output device(s).

1. Group Assignment

FabLab Kannai group work page

We used ESP-NOW — Espressif’s lightweight, connectionless wireless protocol — to send messages between a custom XIAO ESP32S3 board and a Barduino. Pressing a switch on one board triggered an LED on the other, with no router needed; devices communicate directly by MAC address.

It was surprising to find that two boards could talk to each other with no router at all — just the radio and a MAC address. I would definitely consider ESP-NOW for future projects where small boards need to communicate directly with each other.


2. Individual Assignment

I wanted the XIAO ESP32S3’s camera to capture images that I could view on my PC. The board and the PC needed some form of communication, so I ran an HTTP server on the XIAO — a browser on my PC connects to it over Wi-Fi and fetches the image.

2-0. Network Protocols and Addressing

Protocol stack

The XIAO ESP32S3 runs a Wi-Fi server built on this protocol stack:

Layer Protocol Role
Physical / Link Wi-Fi (IEEE 802.11n) Sends and receives data wirelessly between the board and the router
Network IPv4 Gives the board an address (IP) so other devices can find it
Transport TCP Makes sure data arrives completely and in the right order
Application HTTP/1.1 The language the browser and board use to ask for and send files
Application (streaming only) MJPEG over HTTP (multipart/x-mixed-replace) Sends a continuous series of JPEG images to create a live video feed

Addresses used

The ESP32S3 joins the lab Wi-Fi as a station (WIFI_STA) and gets an IP from the router via DHCP:

IP address : 192.168.1.xxx   (assigned by DHCP from the lab router)
Port       : 80              (standard HTTP)

The streaming version’s three endpoints:

http://192.168.1.xxx/        ← main HTML page
http://192.168.1.xxx/stream  ← MJPEG live stream
http://192.168.1.xxx/shot    ← last saved JPEG frame

Unlike the group assignment (ESP-NOW, which bypasses TCP/IP entirely), this is a standard Wi-Fi setup through a router. Any device on the same local network can reach these URLs — no port-forwarding needed.


2-1. Step 1 — Still Image over HTTP

The first sketch gets the camera up and running with minimal code: the ESP32S3 connects to Wi-Fi, starts a tiny HTTP server, and immediately serves a JPEG snapshot at /jpg. Visiting the IP address in a browser shows the image.

The first capture was a little unfortunate — the camera was pointing at the ceiling, because I could not control the timing to capture image.

Auto capture of the ceiling

Build and flash commands:

~/.platformio/penv/bin/pio run -t clean
~/.platformio/penv/bin/pio run -e xiao_esp32s3 -t upload
~/.platformio/penv/bin/pio device monitor -e xiao_esp32s3

If the board freezes, press the reset button and it will reconnect automatically.

Initial prompt I gave to Claude Code

I want to use XIAO ESP32S3 Sense with the OV2640 camera. Connect to Wi-Fi, start an HTTP server, and serve a JPEG snapshot.

Claude Code produced the sketch below. I flashed it with PlatformIO, opened the IP address in a browser, and the camera captured whatever was in front of it.

Source code — still_camera.cpp
#include <Arduino.h>
#include <WiFi.h>
#include "esp_camera.h"
#include <WebServer.h>

// ===== Wi-Fi settings =====
const char* WIFI_SSID = "xxxxx";
const char* WIFI_PASS = "xxxxx";

// ===== XIAO ESP32S3 Sense (OV2640) pin definitions =====
#define PWDN_GPIO_NUM    -1
#define RESET_GPIO_NUM   -1
#define XCLK_GPIO_NUM    10
#define SIOD_GPIO_NUM    40
#define SIOC_GPIO_NUM    39

#define Y9_GPIO_NUM      48
#define Y8_GPIO_NUM      11
#define Y7_GPIO_NUM      12
#define Y6_GPIO_NUM      14
#define Y5_GPIO_NUM      16
#define Y4_GPIO_NUM      18
#define Y3_GPIO_NUM      17
#define Y2_GPIO_NUM      15
#define VSYNC_GPIO_NUM   38
#define HREF_GPIO_NUM    47
#define PCLK_GPIO_NUM    13

WebServer server(80);

void handleRoot() {
  String html =
    "<!doctype html><html><head><meta charset='utf-8'>"
    "<title>XIAO Camera</title></head><body>"
    "<h1>XIAO ESP32S3 Sense Camera</h1>"
    "<p><a href='/jpg'>Show still image</a></p>"
    "<img src='/jpg' style='max-width:100%;height:auto;'>"
    "</body></html>";
  server.send(200, "text/html; charset=utf-8", html);
}

void handleJpg() {
  WiFiClient client = server.client();
  camera_fb_t *fb = esp_camera_fb_get();
  if (!fb) {
    client.println("HTTP/1.1 500 Internal Server Error");
    client.println("Content-Type: text/plain");
    client.println("Connection: close\r\n");
    client.println("Camera capture failed");
    return;
  }
  client.println("HTTP/1.1 200 OK");
  client.println("Content-Type: image/jpeg");
  client.println("Connection: close\r\n");
  client.write(fb->buf, fb->len);
  esp_camera_fb_return(fb);
}

bool initCamera() {
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer   = LEDC_TIMER_0;
  config.pin_d0 = Y2_GPIO_NUM; config.pin_d1 = Y3_GPIO_NUM;
  config.pin_d2 = Y4_GPIO_NUM; config.pin_d3 = Y5_GPIO_NUM;
  config.pin_d4 = Y6_GPIO_NUM; config.pin_d5 = Y7_GPIO_NUM;
  config.pin_d6 = Y8_GPIO_NUM; config.pin_d7 = Y9_GPIO_NUM;
  config.pin_xclk     = XCLK_GPIO_NUM;
  config.pin_pclk     = PCLK_GPIO_NUM;
  config.pin_vsync    = VSYNC_GPIO_NUM;
  config.pin_href     = HREF_GPIO_NUM;
  config.pin_sccb_sda = SIOD_GPIO_NUM;
  config.pin_sccb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn     = PWDN_GPIO_NUM;
  config.pin_reset    = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;
  config.frame_size   = FRAMESIZE_VGA;
  config.jpeg_quality = 12;
  config.fb_count     = 2;
  config.grab_mode    = CAMERA_GRAB_LATEST;

  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) { Serial.printf("Camera init failed: 0x%x\n", err); return false; }

  sensor_t* s = esp_camera_sensor_get();
  if (s) { s->set_vflip(s, 1); s->set_brightness(s, 1); s->set_saturation(s, 0); }
  return true;
}

void connectWiFi() {
  WiFi.mode(WIFI_STA);
  WiFi.begin(WIFI_SSID, WIFI_PASS);
  Serial.print("Connecting to Wi-Fi");
  while (WiFi.status() != WL_CONNECTED) { delay(500); Serial.print("."); }
  Serial.println("\nConnected. IP: " + WiFi.localIP().toString());
}

void setup() {
  Serial.begin(115200);
  delay(1000);
  if (!initCamera()) { while (true) delay(1000); }
  connectWiFi();
  server.on("/", handleRoot);
  server.on("/jpg", HTTP_GET, handleJpg);
  server.begin();
  Serial.println("Open: http://" + WiFi.localIP().toString() + "/");
}

void loop() { server.handleClient(); }

Network view — still image

Browser                          ESP32S3
  |                                 |
  |--- GET /jpg HTTP/1.1 ---------> |   (1 TCP connection opened)
  |                                 |   (board captures one JPEG)
  |<-- HTTP/1.1 200 OK ------------ |
  |    Content-Type: image/jpeg     |
  |    Connection: close            |
  |                                 |
  |<-- [JPEG bytes] --------------- |   (data sent, then connection closed)
  |                                 |

One request, one response, connection closed. To refresh, the browser must request again — the image is always a snapshot from page-load time.


2-2. Step 2 — Live Stream with Button-triggered Shot

The still image approach worked, but I wanted to be able to choose when to capture — not just get whatever the camera sees at the moment the page loads. So I iterated toward a streaming version with three endpoints:

URL Function
http://[IP]/ Main page with embedded stream
http://[IP]/stream MJPEG live stream
http://[IP]/shot Last saved frame as a static JPEG

The workflow is: open /stream to frame the shot, then press the hardware button (D4) to freeze that frame into /shot.

Capture success shown in browser
Live stream with button capture

Prompt I gave to Claude Code

I want streaming. I want to use a switch to take a shot, let me know your implementation idea.

  • http://[IP]/ — main page
  • http://[IP]/stream — live video
  • http://[IP]/shot — last captured frame

Claude Code first produced the plan and after my approval it created the first streaming version. After testing on the board, we found the button did not respond while the stream was active, because handleStream() blocks the main loop. We debugged together — I described the symptom, Claude Code identified the cause and moved handleCaptureRequest() inside the stream loop.

A further issue was that button bouncing caused multiple captures. Claude Code added a hardware interrupt (IRAM_ATTR buttonISR) with a 200 ms debounce, replacing the earlier polling approach. The DIRECT_CAPTURE_VERSION tag in setup() is a debug marker from that session to confirm the correct firmware was running on the board.

Source code — stream_with_shot.cpp
#include <Arduino.h>
#include <WiFi.h>
#include <WebServer.h>
#include "esp_camera.h"

// ===== Wi-Fi settings =====
const char* WIFI_SSID = "xxxxx";
const char* WIFI_PASS = "xxxxx";

// ===== Button input =====
static const int SW_PIN = D4;

// ===== XIAO ESP32S3 Sense + OV2640 =====
#define PWDN_GPIO_NUM    -1
#define RESET_GPIO_NUM   -1
#define XCLK_GPIO_NUM    10
#define SIOD_GPIO_NUM    40
#define SIOC_GPIO_NUM    39

#define Y9_GPIO_NUM      48
#define Y8_GPIO_NUM      11
#define Y7_GPIO_NUM      12
#define Y6_GPIO_NUM      14
#define Y5_GPIO_NUM      16
#define Y4_GPIO_NUM      18
#define Y3_GPIO_NUM      17
#define Y2_GPIO_NUM      15
#define VSYNC_GPIO_NUM   38
#define HREF_GPIO_NUM    47
#define PCLK_GPIO_NUM    13

WebServer server(80);

void handleCaptureRequest();  // forward declaration

// Last saved frame
uint8_t* lastShotBuf = nullptr;
size_t lastShotLen = 0;

// Button interrupt flag
volatile bool captureRequested = false;
volatile unsigned long lastInterruptTime = 0;

void IRAM_ATTR buttonISR() {
  unsigned long now = millis();
  if (now - lastInterruptTime > 200) {  // 200ms debounce
    captureRequested = true;
    lastInterruptTime = now;
  }
}

void freeLastShot() {
  if (lastShotBuf != nullptr) {
    free(lastShotBuf);
    lastShotBuf = nullptr;
    lastShotLen = 0;
  }
}

bool storeFrameAsShot(const uint8_t* buf, size_t len) {
  if (buf == nullptr || len == 0) return false;

  uint8_t* newBuf = (uint8_t*)malloc(len);
  if (!newBuf) {
    Serial.println("malloc failed");
    return false;
  }

  memcpy(newBuf, buf, len);

  freeLastShot();
  lastShotBuf = newBuf;
  lastShotLen = len;

  Serial.print("Shot updated: ");
  Serial.print((unsigned int)lastShotLen);
  Serial.println(" bytes");

  return true;
}

bool captureAndStoreShot() {
  camera_fb_t* fb = esp_camera_fb_get();
  if (!fb) {
    Serial.println("Capture failed");
    return false;
  }

  if (fb->format != PIXFORMAT_JPEG) {
    Serial.println("Frame is not JPEG");
    esp_camera_fb_return(fb);
    return false;
  }

  bool ok = storeFrameAsShot(fb->buf, fb->len);
  esp_camera_fb_return(fb);
  return ok;
}

void handleRoot() {
  String html =
    "<!doctype html><html><head><meta charset='utf-8'>"
    "<title>XIAO Camera</title></head><body>"
    "<h1>XIAO ESP32S3 Sense Camera</h1>"
    "<p><a href='/stream' target='_blank'>Open live stream</a></p>"
    "<p><a href='/shot?t=' onclick=\"this.href='/shot?t='+Date.now();\" target='_blank'>View last shot</a></p>"
    "<p><a href='/capture' target='_blank'>Capture now</a></p>"
    "<p>Press the hardware switch to save the current frame.</p>"
    "<img src='/stream' style='max-width:100%;height:auto;'>"
    "</body></html>";

  server.send(200, "text/html; charset=utf-8", html);
}

void handleShot() {
  if (lastShotBuf == nullptr || lastShotLen == 0) {
    server.send(404, "text/plain; charset=utf-8", "No shot captured yet");
    return;
  }

  server.sendHeader("Cache-Control", "no-store, no-cache, must-revalidate, max-age=0");
  server.sendHeader("Pragma", "no-cache");
  server.setContentLength(lastShotLen);
  server.send(200, "image/jpeg", "");
  server.sendContent((const char*)lastShotBuf, lastShotLen);
}

void handleCapture() {
  bool ok = captureAndStoreShot();
  if (ok) {
    server.send(200, "text/plain; charset=utf-8", "Captured");
  } else {
    server.send(500, "text/plain; charset=utf-8", "Capture failed");
  }
}

void handleStream() {
  WiFiClient client = server.client();

  client.print(
    "HTTP/1.1 200 OK\r\n"
    "Content-Type: multipart/x-mixed-replace; boundary=frame\r\n"
    "Cache-Control: no-store, no-cache, must-revalidate, max-age=0\r\n"
    "Pragma: no-cache\r\n"
    "Connection: close\r\n\r\n"
  );

  while (client.connected()) {
    camera_fb_t* fb = esp_camera_fb_get();
    if (!fb) {
      Serial.println("Stream capture failed");
      delay(30);
      continue;
    }

    if (fb->format != PIXFORMAT_JPEG) {
      esp_camera_fb_return(fb);
      delay(30);
      continue;
    }

    client.print("--frame\r\n");
    client.print("Content-Type: image/jpeg\r\n");
    client.print("Content-Length: ");
    client.print(fb->len);
    client.print("\r\n\r\n");
    client.write(fb->buf, fb->len);
    client.print("\r\n");

    esp_camera_fb_return(fb);

    handleCaptureRequest();  // also handle button during stream

    if (!client.connected()) break;

    delay(80);
  }
}

bool initCamera() {
  camera_config_t config;
  config.ledc_channel = LEDC_CHANNEL_0;
  config.ledc_timer   = LEDC_TIMER_0;
  config.pin_d0       = Y2_GPIO_NUM;
  config.pin_d1       = Y3_GPIO_NUM;
  config.pin_d2       = Y4_GPIO_NUM;
  config.pin_d3       = Y5_GPIO_NUM;
  config.pin_d4       = Y6_GPIO_NUM;
  config.pin_d5       = Y7_GPIO_NUM;
  config.pin_d6       = Y8_GPIO_NUM;
  config.pin_d7       = Y9_GPIO_NUM;
  config.pin_xclk     = XCLK_GPIO_NUM;
  config.pin_pclk     = PCLK_GPIO_NUM;
  config.pin_vsync    = VSYNC_GPIO_NUM;
  config.pin_href     = HREF_GPIO_NUM;
  config.pin_sccb_sda = SIOD_GPIO_NUM;
  config.pin_sccb_scl = SIOC_GPIO_NUM;
  config.pin_pwdn     = PWDN_GPIO_NUM;
  config.pin_reset    = RESET_GPIO_NUM;
  config.xclk_freq_hz = 20000000;
  config.pixel_format = PIXFORMAT_JPEG;

  config.frame_size   = FRAMESIZE_QVGA;
  config.jpeg_quality = 15;
  config.fb_count     = 1;
  config.grab_mode    = CAMERA_GRAB_LATEST;

  esp_err_t err = esp_camera_init(&config);
  if (err != ESP_OK) {
    Serial.printf("esp_camera_init failed: 0x%x\n", err);
    return false;
  }

  sensor_t* s = esp_camera_sensor_get();
  if (s) {
    s->set_vflip(s, 1);
    s->set_brightness(s, 1);
    s->set_saturation(s, 0);
  }

  return true;
}

void connectWiFi() {
  WiFi.mode(WIFI_STA);
  WiFi.begin(WIFI_SSID, WIFI_PASS);

  Serial.print("Connecting to Wi-Fi");
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println();

  Serial.print("Wi-Fi connected. IP: ");
  Serial.println(WiFi.localIP());
}

void handleCaptureRequest() {
  if (captureRequested) {
    captureRequested = false;
    Serial.println("Button pressed -> capture");
    bool ok = captureAndStoreShot();
    Serial.println(ok ? "Capture success" : "Capture failed");
  }
}

void setup() {
  Serial.begin(115200);
  delay(2000);
  Serial.println("DIRECT_CAPTURE_VERSION");
  Serial.println("setup start");

  pinMode(SW_PIN, INPUT);  // assumes external pull-up
  attachInterrupt(digitalPinToInterrupt(SW_PIN), buttonISR, FALLING);

  if (!initCamera()) {
    while (true) {
      delay(1000);
    }
  }

  connectWiFi();

  // Capture one frame on startup
  captureAndStoreShot();

  server.on("/", HTTP_GET, handleRoot);
  server.on("/stream", HTTP_GET, handleStream);
  server.on("/shot", HTTP_GET, handleShot);
  server.on("/capture", HTTP_GET, handleCapture);
  server.begin();

  Serial.println("Camera server started");
  Serial.print("Root   : http://");
  Serial.println(WiFi.localIP());
  Serial.print("Stream : http://");
  Serial.print(WiFi.localIP());
  Serial.println("/stream");
  Serial.print("Shot   : http://");
  Serial.print(WiFi.localIP());
  Serial.println("/shot");
}

void loop() {
  handleCaptureRequest();
  server.handleClient();
  delay(5);
}

Network view — live stream

Browser                          ESP32S3
  |                                 |
  |--- GET /stream HTTP/1.1 ------> |   (1 TCP connection opened, stays open)
  |                                 |
  |<-- HTTP/1.1 200 OK ------------ |
  |    Content-Type: multipart/x-mixed-replace; boundary=frame
  |    Connection: close            |
  |                                 |
  |<-- --frame ------------------- |   \
  |    Content-Type: image/jpeg     |    |
  |    Content-Length: 8192         |    |  repeated continuously
  |    [JPEG bytes]                 |    |  ~12 frames/sec (80 ms delay)
  |<-- --frame ------------------- |   /
  |    ...                          |
  |                                 |   (connection closes when client disconnects)

One HTTP GET opens a TCP connection that stays open. The server pushes JPEG frames continuously, each separated by a --frame boundary with its own Content-Type/Content-Length headers. The browser replaces the displayed image with each new part — this is standard Motion JPEG (MJPEG).

/shot works like /jpg — one request, one response — but returns the frame frozen by the button rather than a live capture.


2-3. How the Code Works

Overall architecture:

ESP32S3 (with camera)
  ├── OV2640 camera  → capture frames
  ├── Wi-Fi          → serve HTTP endpoints
  ├── Button (D4)    → interrupt triggers capture
  └── Heap memory    → holds one saved JPEG frame

1. Button interrupt with debounce (ISR)

void IRAM_ATTR buttonISR() {
  unsigned long now = millis();
  if (now - lastInterruptTime > 200) {
    captureRequested = true;
    lastInterruptTime = now;
  }
}

When the button is pressed, the board immediately runs this small function. Rather than doing the actual capture here, it just sets a flag (captureRequested = true) to say “someone pressed the button.” The real capture happens elsewhere. This is intentional — the button-press function needs to be as short as possible so it doesn’t interrupt other work for too long.

The 200 ms check is a debounce: a physical button press often causes several rapid on/off signals in quick succession. Ignoring anything within 200 ms of the last press means only one capture is triggered per press.

2. Saving a frame

The camera produces a new image roughly every 80 ms. Each image lives in a small reserved memory area and gets overwritten by the next frame immediately. To freeze a specific frame, the code copies it into a separate memory area before the camera moves on. When a new shot is saved, the old copy is deleted first to avoid running out of memory.

3. MJPEG streaming

Content-Type: multipart/x-mixed-replace; boundary=frame

--frame
Content-Type: image/jpeg
Content-Length: 12345

[JPEG data]
--frame
...

The live stream is not a video file — it is just a series of JPEG images sent one after another over a single connection, each labelled with --frame as a separator. The browser replaces the displayed image each time a new one arrives, which creates the appearance of live video.

4. HTTP endpoints

Endpoint What it does
/ Main HTML page with the stream embedded
/stream Sends the continuous stream of JPEG frames
/shot Returns the last frame saved by the button
/capture Saves a frame immediately (without the button)

AI Usage

I used Claude Code as a coding partner for code generation (prompts are documented inline above each sketch) and to refine this report.

Reflections

What I learned about networking and communication protocols

It was interesting to know the difference between ESP-NOW and the standard WI-FI. Both of them use the same physical chip on the ESP32S3 - the 2.4GHz radio transceiver. But the address type was different (IP address for Standard Wifi, and MAC address for ESP Now), and who assigns it (DHCP Router for IP address, and factory burned for Mac address). While ESP-NOW does not need a router, standard WI-FI requires one, which is Fab’s Wi-fi box.

Code and hardware lessons

I had used Claude Code as a coding assistant before, but not for hardware-debugging workflows on real embedded systems. I expected the AI-assisted approach to compress the timeline end-to-end; in practice, code generation was fast but understanding why each design decision was made — why an ISR rather than polling, why the flag is consumed inside the stream loop — took considerably more time than the generation itself.

What worked well: describing a hardware symptom (“button does not respond during streaming”) and having the assistant identify the cause and propose a fix without me needing to understand the root cause first. That accelerated the debugging loop significantly.

What I would do differently: the debounce and ISR patterns are both worth internalising — I would have learned them more efficiently by engaging with the code actively upfront.

Source Code

References


Checklist

  • [x] Linked to the group assignment page
  • [x] Documented your project and what you have learned from implementing networking and/or communication protocols.
  • [x] Explained the programming process(es) you used.
  • [x] Ensured and documented that your addressing for boards works
  • [x] Outlined problems and how you fixed them.
  • [x] Included design files (or linked to where they are located if you are using a board you have designed and fabricated earlier) and original source code.
  • [x] Included a ‘hero shot’ of your network and/or communications setup

Copyright 2026 Fumiko Toyoda — Creative Commons Attribution Non Commercial Source code hosted at gitlab.fabcloud.org