Output Devices

Tasks

Group assignment:

Measure the power consumption of an output device.
Document your work on the group work page and reflect on your individual page what you learned.

Individual assignment:

Add an output device to a microcontroller board you’ve designed and program it to do something.

Group assignment

My contribution was that I did the group assignment.

Follow the link to find it: group assignment week 09

Objectives for the week:

Making a breakout board with 16 red LEDs
- With multiplexing
Control a tiny pololu motor
Control an LED display to output speed numbers (tachometer!)

LED multiplexing

My final project, a bicycle wheel display, will involve controlling quite a few LEDs and switching them as quickly as possible as possible. The easy option electronics- and coding-wise would be something like neopixels, but the NeoPixel refresh rate is too low.

An alternative is row-column multiplexing. When row-column multiplexing LEDs, you can control row x column LEDs with row + column pins. 16 LEDs can therefore be controlled with 8 pins. As you scale the number of LEDs increases square while number of pins increases linearly: 100 LEDS can be controlled with only 20 pins, so I could control 24 RGB LEDs (72 channels) with sqrt(72) * 2 = 16.97 only 18 pins.

Breakout board design

I decided to implement this as a breakout board. This first iteration will only have 16 LEDs, controlled by 4 row pins and four column pins.

Initial doubts:

Do I need transistors to drive this?
Q1, 3 and 5 should be PNP not NPN Transistors, especially if V+ is greater than the V+ of your Arduino. quote
Do we not have transistors that are not mosfets?
If I do need transistors, should I not put them in both the Vin and GND paths?
What is a shunt exactly?
How do I use Spice in KiCAD?2
How do you make a double sided board?

Results

Amazingly, this worked out the first time! I played around a bit with 4 LEDs on a breadboard and then did a lot of reading. This led me to settle into a 4 row, 4 column design. In this video, the LEDs look static to the eye, but the shutter speed syncs so that in the recording it seems they are moving:

The schematic is the following:

led_multiplexer_schematic

I used N-channel mosfets, so they need to be placed downstream of the load. A column pin is set HIGH, so that it outputs current into the MOSFETS, whose gate is connected to a row pin. When the MOSFET conducts, current flows through the LED, the mosfet, and then to ground.

This is kind of an intermediate design. The MOSFET allows me to turn on a whole row at a time without overwhelming the max sink current of the MCU, but the rows are not thus protected. This works because of time multiplexing: I light the device a single row at a time, so there will never be more than one LED on per column in any single moment.

I used a two-layer board design to avoid needing many 0 ohm resistors for jumping.

Front:

Front

Back:

Back

The fabrication was surprisingly easy! I used two .85mm-wide holes for alignment placed centrally. After milling the front, I put in sewing pins through these holes, cut the pin heads off, then reversed the board using the pins as guides. After milling the back hopper, holes, and outline, the resulting alignment was quite great! For vias I just used copper wire:

Handmade vias

Stuffing

It can even serve as a peineta in a pinch!

led_multiplexer_photo_andrea

It connects to my dev board like this:

Board + Multiplexer

If you want to play around with it, you can find the KiCAD project here.

Lessons learned about 2-sided boards

Careful when mirroring the back side! Mirror it with the outline, or it will become misaligned.
When placing it on the mill, be mindful of the way it will fall when you turn it: otherwise it might fall out of the sacrificial board and collide with the walls of the Roland.
To be tried in the future: drilling the alignment holes with the 1/64" endmill would save 2 tool changes.
- Update 2024-06-12: tried it, didn't work: the 1/64" endmill isn't long enough so its shank collides with the hole edges.

Fast switching

I've calculated the precision that I'll need to turn on and off LEDs accurately enough in my final project. It turns that, at 20km/h, a relaxed cycling cruise speed, a given spot near the edge of the wheel takes just 1ms to traverse 5mm. This means that if I want to have pixels around 5mm in size I'll need switching times way below 1ms. I might also need very powerful LEDs in order to have them on as short as possible

First attempt with timer interrupts

Of course, blocking IO is not an option. My first explorations have used MicroPython for speed of development. I used timers to turn and off each row in turn. The result worked great up to around 20fps, but above that would result in weird behavior.

First working attempt in MicroPython. Click to view code

from machine import Pin, PWM, Timer, UART
import time

import micropython
micropython.alloc_emergency_exception_buf(100)


## Barduino
# Barduino pin 14 is connected to the buzzer
r1 = Pin(11, Pin.OUT)
r2 = Pin(12, Pin.OUT)
r3 = Pin(13, Pin.OUT)  
r4 = Pin(15, Pin.OUT)

c1 = Pin(16, Pin.OUT)
c2 = Pin(17, Pin.OUT)
c3 = Pin(18, Pin.OUT)
c4 = Pin(21, Pin.OUT)

rows = (r1, r2, r3, r4)
cols = (c1, c2, c3, c4)
all_pins = rows + cols


def all_off():
    for pin in all_pins:
        pin.off()


def cols_on():
    for col in cols:
        col.on()


def rows_on():
    for row in rows:
        row.on()

def cols_off():
    for col in cols:
        col.off()


def rows_off():
    for row in rows:
        row.off()


def row_pattern(rows, pattern, duration):

    Timer(1).deinit()
    all_off()

    this_row = rows[0]
    other_rows = rows[1:]
    these_lights = pattern[:4]
    other_lights = pattern[4:]

    this_row.on()
    for on, col in zip(these_lights, cols):
        if on:
            col.on()


    def next_step(t):

        all_off()
        print(other_lights)
        if len(other_rows) > 0:
            row_pattern(other_rows, other_lights, duration)

    timer = Timer(1).init(period=duration, mode=Timer.ONE_SHOT, callback=lambda t: next_step(t))        


def show(pattern, duration, refresh_rate):

    millis = 1000 // refresh_rate

    Timer(0).init(freq=refresh_rate, mode=Timer.PERIODIC, callback = lambda t: row_pattern(rows, pattern, millis // 4)) # shortcut; might lead to inaccuracy

    Timer(2).init(period=duration, mode=Timer.ONE_SHOT, callback = lambda t: Timer(0).deinit())




def pos_to_row_col(n):

    row = rows[n % 4]
    col = cols[n // 4]


# 1 to 16
primes = (True, True, True, False, True, False, True, False,
          False, False, True, False, True, False, False, False)


show(primes, 2000, refresh_rate = 10)

Detour: Researching direct port manipulation

I had read that digitalWrite is very slow and that the fastest way to go is to do Direct Port Manipulation. This is something that I've been interested in because I come from a high-level programming background and it feels that this kind of thing lies at the essence of MCU programming wizardry.

I read a loooooot of references and finally found one that made it click for me: What is the fastest way to read/write GPIOs on SAMD21 boards?. From there:

For a custom SAMD21 board with consecutively number bits on PORTA, you can do the fastest read with something like:

static inline boolean fastRead(int bitnum) {
  return !! (PORT_IOBUS->Group[0].IN.reg & (1<<bitnum));
}
and write with:
>
static inline void fastWrite(int bitnum, int val) {
  if (val)
    PORT_IOBUS->Group[0].OUTSET.reg = (1<<bitnum);
  else
    PORT_IOBUS->Group[0].OUTCLR.reg = (1<<bitnum);
}

I used ChatGPT to understand the magic and I think I got it: a port is a group of pins: 8 in ATTinys (that's were the PAxx, PBxx, PCxx numbers come from, with a max of 8) and 32 in SAMD21s: in my particular ATSAMD21E18A, all GPIO pins are in a single port, port A.

Registers are 32-bit numbers (in this processors) which you can write in order to set properties for the pins: for example, if I wanted to set pins 2,3, and 4 of an 8-bit port to INPUT, I would write to the corresponding INPUT register (whose actual name I'd have to find in the datasheet): INPUT = 0b00001110, or something like that.

For my processor, the list of GPIO registers can be found in page 371 of the SAM D21 Family Data Sheet (pdf link)

It's not quite writing assembly, but it feels like it's just one step above.

Backtrack: do I really need timer interrupts and Direct Port Manipulation?

At this point, I was all eager to start writing arcane incantations, but I remembered a piece of advice that I often tell my students in Machine Learning: always do the stupidly obvious thing first, if only to have a baseline to measure against later when you build the complicated "smart" version. In Machine Learning, the gains often are not worth it.

So I took a step back and wrote a first version of the code that uses digitalWrite() to display a pattern in a single row of my LED multiplexer and micros() to measure how long it takes.

First attempt with switching time measure. Click to view code.

const int r0 = 0;
const int r1 = 1;
const int r2 = 2;
const int r3 = 3;

const int c0 = 4;
const int c1 = 5;
const int c2 = 6;
const int c3 = 7;

const int nRows = 4;
const int nCols = 4;

int rows[nRows] = { r0, r1, r2, r3 };
int cols[nCols] = { c0, c1, c2, c3 };
int all_pins[nRows + nCols] = { r0, r1, r2, r3, c0, c1, c2, c3 };
bool pattern[nRows * nCols];

bool primes[nRows * nCols] = { false, true, true, false, true, false, true, false, false, false, true, false, true, false, false, false };

long iteration = 0;
long start, end;
int nCycles = 1000;

void setup() {
  for (int i = 0; i < 8; i++) {
    pinMode(all_pins[i], OUTPUT);
  }

  Serial.begin(115200);

  // See p 378 of the datasheet
  PORT_IOBUS->Group[0].OUTCLR;  // https://forum.arduino.cc/t/what-is-the-fastest-way-to-read-write-gpios-on-samd21-boards/907133/9
  start = micros();
  Serial.println("Let us play");
}

void loop() {

  rowShow(0, primes);

  if (iteration % nCycles == 0) {
    end = micros();

    long averageTime = (end - start) / nCycles;
    Serial.println(averageTime);
    start = micros();
  }
  iteration += 1;
}

void rowShow(int rowNumber, bool pattern[]) {
  allOff();
  digitalWrite(rows[rowNumber], HIGH);

  for (int i = 0; i < nCols; i++) {
    int position =  rowNumber * nRows + i;
    if (pattern[position]) {
      digitalWrite(cols[i], HIGH);
    }
  }

}


void allOff() {
  for (int i = 0; i < 8; i++) {
    digitalWrite(all_pins[i], LOW);
  }
}

void allOn() {
  for (int i = 0; i < 8; i++) {
    digitalWrite(all_pins[i], HIGH);
  }
}

Turns out, it only takes 34us to switch a whole row with this approach! I could allot my 1ms cycle time into 4x34=136 ms for switching and leave the pins on for the remainder 864us. That would provide a time on per row of 864/4= 216us, so a duty cycle of 21.6% and a smear of .216ms * 5mm/ms ~= 1mm.

It's not perfect, and I think it will be noticeable, but it's a starting point. This is enough for the POC, and possibly for the MVP. I'll have to hold myself and save the wizardry for later.

When I come back to it, I think also this Accessing SAM MCU Registers in C guide can be super useful.

Second approach: state machine

All code examples are saved as commits in my repo for this experimentation.

The problem now becomes switching row per row in time. I can use a state machine approach to begin. I found this State Machine and Timers, Medium level tutorial useful, even if I don't quite do it the way they do.

During performance profiling of my solution I found a super funny phenomenon: printing a single double would cause a noticeable flicker of the LEDs. I narrowed down to this:

Version without flicker. Click to show code.

    currentRow = 0;
    busyFraction = double(busyMicros) / double(elapsed);
    Serial.print("busyMicros: ");
    Serial.print(busyMicros);
    Serial.print(" elapsed:");
    Serial.println(elapsed);
    busyMicros = 0;
    start = micros();

Version with flicker. Click to show code.

    currentRow = 0;
    busyFraction = double(busyMicros) / double(elapsed);
    Serial.print("busyFraction: ");
    Serial.print(busyFraction);
    Serial.print(" busyMicros: ");
    Serial.print(busyMicros);
    Serial.print(" elapsed:");
    Serial.println(elapsed);
    busyMicros = 0;
    start = micros();

So, weirdly, it was only the printing and not the calculation that took a long time! Since I was counting time already it was easy to see the exact time penalty of that single print: cycle time was increased by 6ms, which is huge!11

Anyway, so the version (commit) that worked was:

First version with full frame. Click to show code.

const int r0 = 0;
const int r1 = 1;
const int r2 = 2;
const int r3 = 3;

const int c0 = 4;
const int c1 = 5;
const int c2 = 6;
const int c3 = 7;

const int nRows = 4;
const int nCols = 4;

int rows[nRows] = { r0, r1, r2, r3 };
int cols[nCols] = { c0, c1, c2, c3 };
int all_pins[nRows + nCols] = { r0, r1, r2, r3, c0, c1, c2, c3 };
const unsigned int frameRate = 1000;
unsigned long microsecondsPerFrame = 1000000 / frameRate;

// For performance profiling
long busyMicros = 0;
float busyFraction = 0.0;

bool pattern[nRows * nCols];
bool primes[nRows * nCols] = { false, true, true, false, true, false, true, false, false, false, true, false, true, false, false, false };

long iteration = 0;
long start, end, now;
int nCyclesRefresh = 10000;

// State machine
byte prevRow = 0;
byte currentRow = 0;

bool debug = true;

void setup() {

  for (int i = 0; i < 8; i++) {
    pinMode(all_pins[i], OUTPUT);
  }

  Serial.begin(115200);
  delay(100);
  Serial.println("Let us play");

  start = micros();
}

void loop() {
  now = micros();
  prevRow = currentRow;
  updateState(now);

  if (currentRow != prevRow) {
    rowShow(currentRow, primes);
  }

  iteration += 1;
}


void updateState(long now) {
  long elapsed = now - start;
  int segment = elapsed / (microsecondsPerFrame / nRows);

  if (segment > 3) {
    currentRow = 0;
    busyFraction = double(busyMicros) / double(elapsed);

    // Serial.print("microsecondsPerFrame: ");
    // Serial.println(microsecondsPerFrame);
    Serial.print(" busyMicros: ");
    Serial.print(busyMicros);
    Serial.print(" elapsed:");
    Serial.println(elapsed);


    busyMicros = 0;
    start = micros();
  } else {
    currentRow = segment;
  }
}

void rowShow(int rowNumber, bool pattern[]) {
  long thisStart = micros();

  allOff();
  digitalWrite(rows[rowNumber], HIGH);

  for (int i = 0; i < nCols; i++) {
    int position = rowNumber * nRows + i;
    if (pattern[position]) {
      digitalWrite(cols[i], HIGH);
    }
  }

  busyMicros += micros() - thisStart;
}


void allOff() {
  for (int i = 0; i < 8; i++) {
    digitalWrite(all_pins[i], LOW);
  }
}

void allOn() {
  for (int i = 0; i < 8; i++) {
    digitalWrite(all_pins[i], HIGH);
  }
}

State is described by currentRow; every iteration of the loop we check the microseconds elapsed and update it accordingly, giving each row an even part of the time. It worked great until at least 1000fps. Going over that caused uneven illumination of the LEDs, which I assume is caused because some updates are skipped. I didn't bother to diagnose it fully because 1000fps is right at my target for my POC.

On a later modification, just out of curiosity, I took the fastWrite function from What is the fastest way to read/write GPIOs on SAMD21 boards? and replaced all instances of digitalWrite in my code with it. It speeded up things from ~120us to 20us spent in the switching function. Now I can drive the multiplexer up to 10000fps with no visible artifacts! I'm going to target 5000fps, which gives me an uncertainty of ~1mm at 20km/h near the edge of the wheel.

`fastWrite`. Click to show code.

static inline void fastWrite(int bitnum, int val) {
  if (val)
    PORT_IOBUS->Group[0].OUTSET.reg = (1<<bitnum);
  else
    PORT_IOBUS->Group[0].OUTCLR.reg = (1<<bitnum);
}

References

https://www.jameco.com/Jameco/workshop/learning-center/electronic-fundamentals-working-with-led-dot-matrix-displays.html

http://amigojapan.github.io/Arduino-LED-Matrix-Display/

ESP32 S3 pinouts
SAM D21 Family Data Sheet (pdf link)

MOSFETs

Timers, clocks, and multitasking

Timers in microPython
Timer with microsecond resolution: Apparently going under 1ms in the ESP32 is not viable. Lucky I went with SAMD21.
Weightless threads: Very interesting pattern for pseudo-concurrency in microPython.
Timers on the ESP32: Very useful for non-blocking activation of pins. *Arduino micros() function with 0.5us precision - using my Timer2_Counter Library
Help flashing LEDS for specific amount of time using sensor
What is the fastest way to read/write GPIOs on SAMD21 boards?
SAMD21 Arduino Timer Example

Direct Port Manipulation

How to access pins on SAMD21 E18A with Arduino Framework on custom board?
The Case for Direct Port Manipulation
Arduino and port manipulation
A SAMD21 ARM Issue (versus AVR architecture)
Accessing SAM MCU Registers in C: official guide from Microchip for bare metal C programming in the SAMD21.
How to access pins on SAMD21 E18A with Arduino Framework on custom board?: using Platform IO (VS Code) and able to use Arduino libraries.

State Machines

State Machine and Timers, Medium level tutorial
State Machines for Event-Driven Systems: Very interesting example of the Finite State Machines pattern to handle events in context. Uses pointer-to-functions.
Introduction to Hierarchical State Machines: An extension of the previous reference. Basically implements a class hierarchy from scratch.

Platform IO

Programm SAMD21 directly: explains how to modify a platform.ini.
Custom Embedded Boards: official docs.
SAM platforms available in platformIO.
Samd21 custom board and Arduino framework