Networking and Communications

This week focuses on networking — designing, building, and connecting nodes with network addresses and local input/output devices. For my final project, I designed the full networking stack that lets a Hive Monitor Pi talk to the cloud — from first boot through steady-state telemetry. This covers how the device bootstraps, binds to a hive, and maintains persistent MQTT communication with AWS IoT Core.

Assignment Requirements

Group Assignment:

Individual Assignment:

Learning Outcomes:

Project Documentation

Overview

How a Hive Monitor Pi talks to the cloud, from first boot through steady-state telemetry. This documents the device-side networking — every outbound flow is initiated by the Pi. The backend never connects in; there's no inbound port open on the device.

For this week I successfully connected to all my sensors (3× SHT45 temperature/humidity via I²C multiplexer), both cameras (Pi Camera Module 3 Wide on CSI ports), and my custom sensor extension board (designed in Week 9) — all communicating over the network to the cloud backend via MQTT and HTTPS.

Results

I got a reading of 23°C from all three SHT45 sensors and successfully connected to both cameras over the network. The sensor data is flowing through the I²C multiplexer to the Pi and publishing correctly. The cameras are streaming — I just need to get the video hosting integrated into the web dashboard, which I'll finish in Systems Integration week.

I'm using the same custom sensor extension board from Input Devices week — the pass-through PCB with STEMMA QT connectors that extends the I²C cable reach from the multiplexer to the SHT45 sensors in each hive box.

End-to-End Latency — Button Click to LED

From the Pi, the network path is: Pi → MQTT → AWS IoT → AWS Cloud. Here's the full round-trip when a user clicks a button on the website and an LED lights up on the hive:

Hop Latency
Browser → API (HTTPS)~50–150 ms
API → AWS IoT publish (HTTPS)~20–80 ms
IoT → Pi (MQTT push)~30–100 ms
Pi GPIO ramp (intentional slow movement)200 ms
Pi → IoT state echo~30–100 ms
IoT → backend → WebSocket → UI~50–200 ms
Total round-trip~380–830 ms

The 200 ms GPIO ramp is intentional — the servo and LEDs ramp slowly to avoid current spikes that could brownout the Pi. Under normal conditions, a user clicks a button and sees the state update in under a second.

Hero Shot — Network Working

Sensor board plugged in and connected to the Pi via I²C multiplexer

Custom sensor extension board plugged in and connected to the Pi via the I²C multiplexer

Successful sensor test — 23°C reading from all three SHT45 sensors

Successful test — all three SHT45 sensors reading 23°C over the network

System running — sensors and cameras connected over PoE

System running — sensors and cameras connected over PoE network

30-Second Picture

Pi (admin LAN, NAT'd)                      hive-monitor.com (AWS, us-east-1)
─────────────────────                      ─────────────────────────────────
hive-monitor-bootstrap.service ──HTTPS──▶  api.{dev.,}hive-monitor.com
(one-shot, until bound)               GET  /devices/{serial}/bind-packet
                                       POST /hives/bind        (manual path)

hive-monitor-agent.service ─────MQTT───▶  <account>-ats.iot.<region>.amazonaws.com
(steady-state)                        mTLS on 8883
                                      pub/sub on hive-monitor/hives/{hiveId}/*
                                      and $aws/things/hive-monitor-{hiveId}/shadow/*

hive-monitor-ota.timer ─────────HTTPS──▶  ota.{dev.,}hive-monitor.com   (planned)
(every 6h, hourly during retry)       GET signed release artifacts

hive-monitor-camera.service ────MQTT───▶  (same IoT endpoint, distinct MQTT session)
(only while a viewing session is active)

Channels at a Glance

Flow When Protocol Port Auth
Fleet bootstrap pollBoot, until /etc/hive-monitor/bound existsHTTPS443None (backend gates on operator claim)
Manual bindOne-time, running bind.shHTTPS443One-time Binding_Token
Telemetry / health / eventsSteady-state, every 60s or on changeMQTT 5 over TLS8883mTLS with per-device X.509 cert
Commands / shadowOn demandMQTT 5 over TLS8883Same cert
Camera feedActive viewing session onlyMQTT 5 over TLS8883Same cert (QoS 0)
OTA fetchEvery 6h (planned)HTTPS443Ed25519 signed payload
TimeContinuousNTP123None
DNSContinuousDNS53None

That's the full allow-list. Anything else is blocked by ufw.

Domains and DNS

The device picks its environment from the image, not at runtime:

Image channel Built from Bootstrap URL
devdevelop branch + feature branch CI buildshttps://api.dev.hive-monitor.com
prodmain + v* tagshttps://api.hive-monitor.com

pi/image/stage-hive-monitor/03-image-version/00-run.sh writes this at image-build time from IMAGE_API_BASE, which the CI workflow derives from the git ref. Operators don't pick the environment — they pick the image. Self-hosted backends override by editing /etc/hive-monitor/bootstrap.conf post-flash.

The IoT endpoint hostname (<account>-ats.iot.<region>.amazonaws.com) is not baked in — it's returned by the bind packet alongside the device cert, so the device only learns where to MQTT-connect after binding.

Phase 1 — Bootstrap (Unbound)

hive-monitor-bootstrap.service runs at every boot while /etc/hive-monitor/bound is missing (systemd ConditionPathExists=!/etc/hive-monitor/bound). It does two things:

  1. Renames the host to hive-monitor-<last-6-hex-of-cpu-serial>. Reads /proc/cpuinfo, runs hostnamectl set-hostname, rewrites /etc/hosts, restarts avahi-daemon. Idempotent — no-op if hostname already matches. A fleet of 100 Pis flashed from the same image each advertise themselves at a unique hive-monitor-<id>.local instead of all colliding on hive-monitor.local.
  2. Polls GET https://api.{dev.,}hive-monitor.com/devices/<cpu_serial>/bind-packet every 5 seconds:
    • 404 — "device not claimed yet, keep polling" — quiet, no error, resets the failure streak.
    • 200 — bind packet (cert, key, root CA, hive metadata). Install atomically, exit.
    • Anything else (DNS fail, TLS error, 5xx) — exponential backoff 10 → 20 → 40 → 80 → 160 → 300 s (capped).

The endpoint is unauthenticated: anyone can ask for any serial's bind packet. The security gate is on the backend — it returns 404 until an operator explicitly claims that serial in the dashboard, and the resulting cert is bound to that specific hive.

Once a bind packet is installed, the service creates /etc/hive-monitor/bound and exits 0. On next boot the ConditionPathExists skips it entirely.

Phase 2 — Binding (The Moment of Transition)

Two ways to bind a device:

  • Fleet path (default): the bootstrap loop above. Operator claims the device in the dashboard, next poll returns 200 with the bind packet, bootstrap installs everything atomically and exits. Zero operator action at the Pi.
  • Manual path: operator SSHes in and runs pi/installer/bind.sh --token <token> --region <region>. This POSTs the token to ${API_BASE}/hives/bind, gets the same bind packet schema in response, installs it the same way.

Either way, the file layout after bind is identical:

/etc/hive-monitor/
├── bound                 # marker — its existence gates everything else
├── bootstrap.conf        # baked at image build
├── image-version         # version, commit_sha, build_ref, env
├── config.json           # Hive_Metadata: hive_name, gps, timezone, hive_id, iot_endpoint
├── cert.pem              # device X.509 cert (mode 0600, owner hive-monitor)
├── private.key           # device private key (mode 0600, owner hive-monitor)
├── root-ca.pem           # AWS IoT root CA
├── peripherals.json      # Peripheral_Manifest
├── sensors.json          # Sensor_Config
└── sampling_config.json  # Sampling_Config (cadences)

bind.sh writes every one of these through a staging dir (mktemp → install) so a failure mid-install leaves the device unbound — never half-bound.

Phase 3 — Steady-State (Bound)

hive-monitor-agent.service starts once /etc/hive-monitor/bound exists. On startup:

  1. Reads every file in /etc/hive-monitor/. Refuses to start if anything is missing or invalid.
  2. Connects to <iot_endpoint> (from config.json) on TCP 8883, mTLS using cert.pem + private.key + root-ca.pem.
  3. Client ID = hive-monitor-<hive_id>. AWS IoT policy restricts pub/sub to topics matching its own {hiveId} — one cert can't see another device's traffic.
  4. Subscribes to its downlink topics. Publishes a provisioned event on first connect; an LWT on events will publish a disconnected event if the TCP session drops.
  5. Starts the driver scheduler, telemetry publisher, health reporter, autonomous controllers (door schedule, fan, LEDs), and shadow client.

Connection retry is 1 s → 60 s exponential backoff with ±20% jitter, forever. Publishes that fail land in the Offline_Buffer (SQLite at /var/lib/hive-monitor/buffer.db, capped at 7 days / 500 MB FIFO) and replay on reconnect.

MQTT Topic Schema

All topics are scoped to one hive via {hiveId} (the cert's IoT policy enforces this).

Uplink (Pi → Backend)
Topic QoS Cadence Payload
.../telemetry160 sSensor map + weight, door/fan/led state, gps, timestamp, seq
.../health160 sCPU temp, throttling, RAM, disk, load
.../events1on changeprovisioned, disconnected (LWT), power_warning, etc.
.../door/state1on change{state, timestamp, request_id}
.../camera/{channel}/feed0per frameH.264 NALU bytes — transient, no retry
Downlink (Backend → Pi)
Topic QoS Purpose
.../door/command1Open/close with request_id for idempotent dedup
.../door/schedule1Replace Door_Schedule
.../led/command1LED override
.../camera/{channel}/control1start/stop camera streaming
.../ota/notification1New release announcement
.../recover/limp1Admin-signed Limp_Mode recovery
Device Shadow
Topic Purpose
$aws/things/hive-monitor-{hiveId}/shadow/updatePi writes reported state
.../shadow/update/deltaPi receives desired deltas
.../shadow/get / .../get/acceptedInitial sync on boot

Auth Model

Three trust boundaries:

  1. Bootstrap endpoint — unauthenticated. The backend decides whether to return a bind packet for a given CPU serial based on operator action in the dashboard. Effectively trust-on-claim.
  2. Bind endpoint (manual path) — token-authenticated. The Binding_Token is 32 random bytes minted by the backend at hive creation, one-shot, scoped to one hive.
  3. MQTT (steady-state) — mTLS with per-device X.509 cert issued by AWS IoT during bind. The cert is bound to one Thing (hive-monitor-<hive_id>) and one IoT policy that scopes publish/subscribe to that hive's topic prefix. Compromising one device's cert exposes only that device's data.

Firewall

ufw rules baked into the image:

  • Inbound: default-deny. SSH is enabled only when the operator passes --enable-ssh at bind time; otherwise port 22 is closed.
  • Outbound: allow 443 (HTTPS — bootstrap, bind, OTA), 8883 (MQTT TLS — IoT Core), 123 (NTP), 53 (DNS). Everything else denied.

There is no inbound path from the backend to the device. Commands ride the downlink MQTT topics — AWS IoT holds the device's mTLS session open and pushes when it has something to deliver.

Offline Behavior

The Pi is designed to keep operating while disconnected:

  • Sensor reads, autonomous controllers, camera capture — all continue normally. Fan control uses Pi CPU temperature, so it works regardless of cloud reachability.
  • Outbound publishes — telemetry, health, events, state changes are enqueued in /var/lib/hive-monitor/buffer.db (SQLite, FIFO, 7 day / 500 MB caps). Camera feeds are NOT buffered — live video is transient.
  • On reconnect — buffered payloads replay to their original topics at ≤50 publishes/s with original timestamps preserved.
  • OTAhive-monitor-ota.timer polls every 6 h. Misses are caught up on the next successful run.

The Pi's MQTT client retries connect forever with bounded exponential backoff. No outage scenario causes the device to "give up" and require operator intervention.

Troubleshooting

Symptom First Check
Name or service not knownDNS not resolving. Check Route53 record.
HTTP Error 503No healthy targets behind ALB. Wait for API CI to complete.
HTTP Error 4xx (non-404)Backend rejected request. Check CPU serial format.
Polling indefinitely with 404Expected — device not claimed in dashboard yet.
mqtt connect attempt N failedWrong cert/key/CA, cert not ACTIVE in IoT console, or policy not attached.
Bound but no telemetryCheck journalctl -u hive-monitor-agent for peripheral_init_failed.
Service flapping exit 78/etc/hive-monitor/bound missing or config.json invalid.

How This Meets the Assignment Requirements

Requirement How It's Met
Wireless node with network address Pi 5 on WiFi with DHCP IP + mDNS hostname hive-monitor-<id>.local
Local input/output devices SHT45 sensors, cameras (input); door servo, fan, LEDs (output)
Networking protocol MQTT 5 over mTLS (8883) + HTTPS (443) for bootstrap/bind/OTA
Communication between nodes Pi ↔ AWS IoT Core — bidirectional via MQTT pub/sub + device shadow
Network design workflow Three-phase lifecycle (bootstrap → bind → steady-state) with firewall, offline buffer, exponential backoff

For the week 11 group assignment, our team sent a message between two projects using Morse code. Full group documentation: Charlotte Lab — Week 11 Group Assignment.

What We Did — Morse Code Communication

Our team built a Morse code transmitter and receiver — one board blinks an LED in Morse code patterns, and the other board reads the light signal with a photoresistor and decodes it back into text. This demonstrates network communication between two separate devices using light as the physical medium.

My Contributions

I helped build the Morse code system — assembling the breadboard, wiring up the components, and getting the two boards talking to each other. The transmitter encodes each character into dots and dashes (short and long LED pulses), and the receiver times the light/dark intervals to decode the message back into letters.

What I Learned

  • Timing is everything: Morse code relies on precise timing — a dot is one unit, a dash is three units, gaps between letters are three units, gaps between words are seven units.
  • Physical layer matters: Even with a simple LED-to-photoresistor link, ambient light and distance affected reliability — same problem real networking faces.
  • Parallels to my project: The Morse code system is a simplified version of what my hive monitor does — encoding data into a protocol, transmitting over a medium, and decoding on the other end. Same concept, different scale.

Useful Links