Final Project Action Plan

This is the schedule I made in google sheets for my final project since the deadline is drawing nearer.

schedule

You can access the schedule in google sheets here.

Here is a basic summary of every tasks I've completed and all the tasks I have to complete:

Final Project Checklist

Final Project Checklist

Electronics

Programming

Mechanical

Documentation

Everything about my final project:

The desk bot is going to be an offline penguin shaped desk bot that helps you focus and corrects posture. It runs entirely on a XIAO ESP32-C3. Power comes from a 5V USB adapter plugged into the wall.

2. Bill of Materials (BOM)

ComponentQtyDimensionsPurpose
XIAO ESP32-C3121 × 17.5 mmMain brain
DS3231 RTC Module138 × 22 mmReal-time clock for focus logs
KY-040 Rotary Encoder131 × 18 mmMenu navigation input
1.3″ OLED Display235 × 33 mmFace expressions + belly stats
Arducam Mini 2MP (B0067)134 × 24 mmPosture detection camera
DFPlayer Mini MP3120 × 20 mmAudio playback controller
8Ω 3W Mini Speaker131 × 28 mmVoice cues and alerts
Custom PCB1Holds XIAO + all connectors
5V USB-C Adapter1Power supply (wall plug)
MicroSD Card1Stores audio .mp3 files
3D printed PLA shellsPenguin enclosure (see §4)

Refined rough sketch:

schedule

3. Where and How to Write the Arduino Code

  • Code is written in Arduino IDE on a laptop, uploaded once via USB C, then the laptop is disconnected forever
  • After the upload, the XIAO runs the code standalone, powered only by the 5V wall adapter
  • Libraries needed (installed in Arduino IDE → Library Manager):
    U8g2 — for both OLEDs
    RTClib — for DS3231
    DFRobotDFPlayerMini — for audio
    ESP32Encoder — for rotary encoder
  • Workflow: write sketch on laptop → plug XIAO in via USB-C → click Upload → unplug laptop → plug in wall adapter → ELI works independently

4. Mechanical Design: 3D Printed Parts

4.1 Penguin Dimensions

  • Total height: 160 mm
  • Head: 90 mm tall × 105 mm wide
  • Body: 70 mm tall × 85 mm wide (incl. flippers)
  • Feet: 20 mm tall
  • Wall thickness throughout: 2 mm PLA

4.2 All Printed Parts and Their Colors

PartColorNotes
Head back shellBlackHollow dome, slots on inner wall
Head front plate (face)WhiteOLED cutout, camera bracket
Goggle ringBlack or whiteClips over face OLED
Body back shellBlack all internals mount here
Body front shellWhiteBelly OLED + encoder cutouts
BeakOrangeSeparate piece
Feet (×2)OrangeSeparate
Flippers (×2)OrangeSeparate, rectangular peg mount

4.3 How Each External Part Mounts

  • Beak: Has rectangular bent tabs on top edges that can be secured with screws or 3d printed pillars inside the face plate.
  • Feet: Each foot has two ø8 mm round pegs on top. Pegs press into matching holes in the base of the body back shell. Superglue — permanent.
  • Flippers: Two rectangular pegs on the inner edge of each flipper. Rectangular pegs prevent rotation and drooping. Press into holes on the body back shell side walls..
  • Goggle ring (face OLED frame): Separate printed ring that sits over the face plate. Four cantilever snap tabs (2 × 4 mm arms, 1 mm hook) on the back of the ring. Four matching slots cut into the face plate edge. 0.2 mm clearance — clicks in, squeeze tabs to remove. Hides the rectangular OLED corners and creates the goggle look.

4.4 How the Shells Join Together

  • Head: white face plate into black dome. Black back shell is a hollow dome, open at the front. White face plate has four snap tabs with hooks on its edges. Face plate slides into the dome from the front → four hooks catch inside four slots on the dome's inner wall. Black dome wall (2 mm) wraps visibly around the edge of the white face — this IS the natural black border of the penguin face, no painting needed.
  • Body: front shell onto back shell. Split line runs vertically along the sides of the body. Back shell is deeper (~50 mm) and holds all electronics. Front shell is shallower (~30 mm), holds belly OLED and encoder. Three snap tabs on back shell edge click into three matching slots on front shell edge. Same tab geometry as head join (2 × 4 mm arms, 1 mm hooks).

4.5 External Component Placement (front view)

ComponentLocationMount method
Arducam lensForehead, centre in the camera cutout, 10° downward tilt the module will be secured with screws from the inside
Face OLED top centre on the face plate, above beakRectangular cutout + 4 printed ledge/round tabs.
Goggle ringOver face OLED for the framing 4 snap tabs into face plate
BeakBelow face OLED, ~15 mm gapRound snap fits
Belly OLEDUpper belly, on the white belly shell/plateRectangular cutout and 4 printed ledge/round tabs
Rotary encoderBelly button position, ~18 mm below belly OLED6 mm hole in front shell, 2 printed ledge/round tabs
USB-C portRight side of body back shell, mid-height
Speaker grilleLower back shell, left sidePrinted slot pattern, speaker inside

5. Internal Component Placement

All components mount to the interior walls of the body back shell.

5.1 Mounting Method: Printed Standoffs/pillars

  • Cylindrical standoffs (5 mm tall) printed directly on the back shell interior wall
  • PCB (XIAO) rests on top of standoffs
  • a cap is used to close and tighten the standoffs.

5.2 Component Positions (top to bottom inside back shell)

  • Upper zone (just below the head): XIAO ESP32-C3 PCB is mounted flat on back wall. DS3231 RTC is mounted below XIAO, 3 mm gap between them.
  • Mid zone: Belly OLED and encoder is mounted on FRONT shell , centred on white belly.
  • Lower zone: DFPlayer Mini is mounted on the back wall. Speaker is mounted on lower back wall right side where there is grille slots on the exterior.

5.3 Wire Routing

  • C-shaped printed clips on back wall (spaced every 20–25 mm) hold wire bundles
  • 6. Posture Detection — Method (Arduino C++)

    7.1 What the camera does

    • Arducam captures a JPEG frame, XIAO decodes it to raw pixels
    • Frame is downsampled to 80 × 60 pixels before processing (fast enough on XIAO)

    7.2 Five detection factors

    • Shoulder width: when slouching, shoulders roll forward and appear narrower in frame. Measured as pixel distance between left and right shoulder landmarks estimated from skin tone region edges.
    • Forward head offset: ear midpoint should be vertically above shoulder midpoint. When head juts forward, ear rises above shoulder line. Measured as vertical gap in pixels.
    • Shoulder tilt: one shoulder dropping sideways more than the other indicates slouching/leaning sideways.
    • Head drop: nose position compared to shoulder midpoint. If nose drops significantly below baseline, head is drooping.
    • Face size ratio: when leaning forward, face appears larger relative to shoulder width. Ratio increase confirms forward lean vs mere crookedness.

    7.3 Brightness centroid (primary on device method)

    This code lowers the camera resolution to 80×60 pixels to make processing faster. It then looks for skin-colored pixels in the frame and calculates their average vertical position (called the centroid). This basically tells where your head and upper body are in the image. If not enough skin pixels are found, the system assumes you are not in front of the camera and pauses the timer. If the average position of the detected skin pixels moves too low in the frame, it assumes you are slouching and triggers an alert.

    // Downsampled frame: 80×60 pixels
    // Find vertical centre of mass of skin-toned pixels
    int rowSum = 0, skinCount = 0;
    for (int row = 0; row < 60; row++) {
      for (int col = 0; col < 80; col++) {
        if (isSkin(r, g, b)) { rowSum += row; skinCount++; }
      }
    }
    if (skinCount < 20) { pauseTimer(); return; } // person absent
    int centroid = rowSum / skinCount;
    if (centroid > SLOUCH_THRESHOLD) alertSlouch();

    7.4 Skin tone detection (RGB ranges)

    bool isSkin(uint8_t r, uint8_t g, uint8_t b) {
      return (r > 95 && g > 40 && b > 20 &&
              r > g && r > b && abs(r-g) > 15 && r > 100);
    }

    7.5 Calibration (one time on first use)

    • User presses encoder while sitting upright
    • the bot samples 60 frames over 3 to 5 seconds
    • Calculates average centroid → saves as BASELINE to flash memory (survives power off)
    • SLOUCH_THRESHOLD = BASELINE + (0.20 × 60) which is 20% lower than upright position
    • Recalibrate any time from menu

    7.6 Debouncing (importance for accuracy while working)

    • Must see slouch for FRAMES_NEEDED = 30 continous frames before alert fires
    • At 10 fps AI processing: 30 frames = ~3 seconds of sustained slouch
    • The counter decreases slowly for every good-posture frame instead of resetting instantly.
    • Buzzer cooldown is 30 seconds minimum between alerts so that the bot does not nag constantly
    • If face disappears from frame ,pause timer (person left desk), no alert

    8. Menu System Logic

    8.1 How the rotary encoder works

    • Turn clockwise : scroll down / increment value
    • Turn counter clockwise : scroll up / decrement value
    • Press to select / confirm
    • Long press (hold 1.5s) : go back to previous screen / cancel

    8.2 Full menu tree

    HOME SCREEN (idle, shows time + ELI face)
    │
    ├── [press] → MAIN MENU
    │     ├── ① Start Focus
    │     │     ├── Pomodoro (25 min work / 5 min break)
    │     │     └── Custom Timer (set minutes with encoder)
    │     ├── ② Focus Blueprint
    │     │     └── Shows: sessions today, best time of day, total hours
    │     └── ③ Settings
    │           ├── Calibrate Posture
    │           └── Set Sound Volume
    │
    └── [during active timer]
          ├── Posture check runs every ~5 seconds (background)
          ├── Stretch reminder at 90-minute mark
          └── [press] → Pause / Resume timer

    8.3 Timer logic (Pomodoro)

    • Work phase: 25 min countdown on belly OLED, face OLED shows focused eyes
    • At 0:00 → DFPlayer plays "Session done! Take a break." → 5 min break countdown
    • At break 0:00 → DFPlayer plays "Ready for next round?" → back to work phase
    • Stretch reminder: after 90 cumulative work minutes → "Stand and stretch, good job!"
    • Custom timer: encoder scrolls minutes 1–120, press to confirm, same countdown logic

    8.4 Focus Blueprint storage

    • DS3231 RTC provides accurate timestamps
    • Each completed session stored as integer counters in ESP32 flash (not SD card)
    • Stores: total sessions completed,
    • Display shows: Sessions: 12 | Best: 6 AM | Today: 2

    9. Audio System (DFPlayer + Speaker)

    • MicroSD card contains numbered .mp3 files (e.g., 001.mp3 = startup sound)
    • All audio files pre recorded and loaded onto SD before assembly
    • Audio cues list:
      • 001 — startup jingle
      • 002 — "Sit tall! Good posture is good for your brain."
      • 003 — "Session done! Take a break."
      • 004 — "Stand and stretch, good job!"
      • 005 — "Ready for next round?"
      • 006 — encoder confirm beep (short tone)
      • 007 — "No one home." (person left desk)

    10. OLED Expressions (Face OLED)

    StateExpression
    Idle / good postureHappy half-moon eyes, slow blink
    Focused / timer runningDetermined narrow eyes, steady
    Slouch detectedAlarmed wide eyes
    Break timeSleepy drooping eyes
    Person absentSleeping zzz eyes
    StartupBlinking open animation

    Both OLEDs use the U8g2 library. 128 × 64 resolution; left 64×64 = left eye, right 64×64 = right eye. Expression animations stored as byte arrays, cycled in main loop.

    11. How All Systems Work Together (System Logic)

    Every 100 ms (main loop tick):
    │
    ├── Read encoder → update menu state if changed
    ├── Read RTC → update timer countdown
    ├── Every ~5 s: capture camera frame → run posture check
    │     ├── Good posture → no action
    │     ├── Sustained slouch (30+ frames) → beep buzzer once, show alarm face
    │     └── Face absent → pause timer, show sleeping face
    ├── Update belly OLED (timer / blueprint / menu)
    ├── Update face OLED (current expression)
    └── Check idle timeout (10 min no encoder input → sleep mode)

    11.1 Power flow

    5V USB-C → PCB → 3.3V regulator on XIAO → all logic components.

    11.2 Startup sequence

    • Power on → XIAO initialises all peripherals
    • Face OLED plays eye opening animation
    • DFPlayer plays startup sound
    • Belly OLED shows: Hello! I'm E.L.I. → transitions to home screen after 2 seconds
    • RTC synced, camera warmed up, menu ready

    11.3 Idle / sleep mode

    After 10 minutes with no encoder input and no active timer, face OLED shows sleeping eyes, belly OLED dims, camera stops capturing (saves power). Any encoder interaction wakes ELI instantly.