Lab LogoFletch's Fab Academy 2014 Blog

12. Output Devices

This weeks task was to "add an output device to a microcontroller board you've designed and program it to do something".  I liked the video output board that Neil demonstrated and I liked the idea of a challenge, so I decided to design a Fab Video Game board based on this principle.  Ideally I'd like the board to be able to run old 8 bit style games from the 1980's eg. Tetris or Asteriods.

Based on my experience with the spinning top I decided that I needed more FLASH than that available in the ATTiny44a, so I decided to upgrade to the ATMega328 as we have those in stock here in the Manchester lab.  I also added 4 buttons for control.  On top of this the board would need an audio line output that would feed the same monitor as the composite video out.

I didn't make it all the way to implementing Tetris, but the video below shows a short demo running on my test board and a Pico PK320 projector.  The image looses sync slightly as my line drawing is currently taking slightly too long and corrupting a sync pulse.

FabLab video board demo from James Fletcher on Vimeo.

Picture of the final test board.  I didn't have much time to route it so I kept it simple.  Ideally the board would have had the RCA connectors along the top edge and the buttons along the bottom.
Final board

12.01 Video Output

After looking at Neil's design I decided that the limiting factor was the rate at which the state of the video output could be changed by toggling the port pins in software.  I decided that this could be improved by using the SPI or UART pins to shift a bit pattern out under hardware control.  After looking at the available byte framing on different hardware modules I decided to use the USART in SPI mode for the following reasons:
I used a PK320 Pico projector from Optima, as this supports both PAL and NTSC I decided to stick with a NTSC format so that I could initially start with Neils code to test the board.  For my own code I used NTSC timing info from http://www.astro.umontreal.ca/fantomm/Modedemploi/camera_guide_gene.pdf

Early test pattern and sample video signal on scope.
Early test patternScope video signal

12.02 Audio Output

The audio output is a simple line level out. This is roughly 1v peak to peak into a high impedance (approx 10K) input.  I use a simple voltage divider from a single PORT pin to allow me to generate square waves between approx 0v and 0.5v.  I then use a small dc blocking capacitor to remove the dc component of the signal and smooth it slightly.  Although this square wave is far from a perfect audio signal it approximates to a a sine wave at the same fundamental frequency with lots of higher order harmonics.

I then use Timer0 in CTC mode to generate a square wave on OC0A and this to drive the audio.

I transposed the tune from the sheet music on http://www.gamemusicthemes.com/sheetmusic/gameboy/tetris/themea/Tetris_-_Theme_A_by_Gori_Fater.pdf.
Frequencies for notes were based on http://en.wikipedia.org/wiki/Piano_key_frequencies.

12.03 Design Files and Code

The Eagle schematic and board files are here.  Along with the traces and interior png's.
SchematicBoard layout

The source code is here, it will build and run on either a 16MHz ATMega168 arduino or my 20MHz ATMega328 test board.

12.04 Issues

12.04.1 The Case of the Failed PCB Milling

I saw a 'fabduino' project online and noticed that this provided Eagle files using the ATMega328 and png's to mill with the standard 1/64th bit.  So I mistakenly assumed that the ATMega328 TQFP 32 part in the Eagle fab libraries was millable with the 1/64 bit.  I was wrong, my first attemp is shown below, note that not tool paths have been generated between the pins of the TQFP package in the middle:
Fail with 1/64th bit

When I had come across this before I had found that a quick solution was to lie slightly about the tool size to fabmodules.  By reducing the tool size slightly from 0.4mm to 0.3mm it will generate tool paths for smaller gaps but all traces and pads will also be slightly thinner as the tool will run closer to the outside boundary of the pad or the trace.  It's not the correct way to fix it but it works sometimes as a quick fix.  However for this board I had to reduce the tool size right down to 0.2mm to get fabmodules to generate all of the tool paths between the pads.  This is far too far from the real tool size and the final traces are too fragile!
Failed board with thin traces

I investigated exporting at higher DPI from Eagle but this didn't fix the issue.  I also looked into using a smaller 10mil tool and doing 3 passes on the Modela (10mill, 1/64inch and 1/32inch), but this would have required more setup time than I had available.  At this point I ran out of time to mill boards before the Easter break and so decided to prototype my code on an Arduino whilst I investigated the problem in more detail.
Arduino test

On closer inspection of the fabduino traces files I noticed that the TQFP package didn't match the one in the Eagle libraries, the pads were much thinner.  Rather than create a new library component and have to re-route my board when I swapped the package around, I decided to manually edit the traces file in GIMP to remove some copper between the pads.
Editing traces in gimp
This finally produced a millable board.
Board milled ok

12.04.2 The Case of the Broken Fuse Bytes

When working with the Aruino I managed to flash some broken fuse bytes at some point.  This left the ATMega328 in a state where it wasn't generating an internal clock and so I couldn't flash it with the FABISP.  To get around this I fed a clock signal into the AVR on the TOSC1 pin whilst trying to flash and slowed the FABISP bit rate by changing the programmer line in the makefile to the following:

PROGRAMMER = -c usbtiny -B 1024 -e

Luckily my portable scope had a 200KHz square wave output that I could use for the clock and just touch the scope probe to the AVR pin whilst flashing.  This only needed to be done once to program sensible values back into the fuse bytes.
Fixing fuses

12.05 Thoughts on IRQ Response Time

Initially I thought I would use irq service routines from Timer1 driving a state machine to generate all of the video timing information rather than spinning in a polling loop waiting for flags to change saying that a count had expired.  This would allow for a better separation in the code between the NTSC signal generation and other work such as gameplay.  However on investigation I discovered that the irq response time is too poor to meet the 5us timing requirements around the sync pulses.  A basic irq routine in C and compiled using avr-gcc with -Os comes out at over 40 cpu cycles when you include the time it takes too enter the isr routine.  This alone is 2.5us at a 16MHz clock.  For this reason I fell back to using a polling method to check the timer and generate the video signal.

With effort this could be improved.  Some registers could be reserved so that the compiler didn't use them.  Then dedicated asm routines could get rid of a large part of the function prolog / epilog and avoid stacking registers.  However any pre-compiled LIBC code could still use these registers causing other issues.

Another option would be to use a hybrid system, polling within and isr for the timing around the sync pulses and using a more traditional isr approach during the displayable portion of the line to allow easier singal generation / gameplay work division.

IRQ Response Time Analysis

// init
    // Counts up in CTC mode
    // No OC1x behaviour
    TCCR1A = 0b00000000;
    TCCR1B = 0b00001001; // CTC, 1 prescaler
    TCNT1H = 0;
    TCNT1L = 0;
    OCR1AH = (_VIDEO_hsync_front_porch >> 8); // initially set timing to full line width so we avoid any early irq's
    OCR1AL = (_VIDEO_hsync_front_porch & 0xff);
    TIMSK1 |= (1<< OCIE1A); // Enable timer 1 compare A irq

        VIDEO_port = 0;
        VIDEO_port |= VIDEO_high;

    // Above IRQ code generates the following assembly, numbers at the end of the lines are instruction cycle counts.
     90:    1f 92           push    r1                  2
     92:    0f 92           push    r0                  2
     94:    0f b6           in    r0, 0x3f    ; 63        1   get SREG
     96:    0f 92           push    r0                  2   push SREG
     98:    11 24           eor    r1, r1                  1   zero into r1
     9a:    8f 93           push    r24                 2   push temp reg used by code
     9c:    80 91 00 01     lds    r24, 0x0100             2   load variable
     a0:    8f 5f           subi    r24, 0xFF    ; 255   1   inc
     a2:    80 93 00 01     sts    0x0100, r24             2   store incremented variable
     a6:    80 ff           sbrs    r24, 0              1/2/3   skip if bit set
     a8:    02 c0           rjmp    .+4          ; 0xae <__vector_11+0x1e>       2   rel jump
     aa:    1b b8           out    0x0b, r1    ; 11                                1   clear video port
     ac:    01 c0           rjmp    .+2          ; 0xb0 <__vector_11+0x20>       2
     ae:    59 9a           sbi    0x0b, 1    ; 11                                    2   set bit in video port reg
     b0:    8f 91           pop    r24                     2
     b2:    0f 90           pop    r0                      2
     b4:    0f be           out    0x3f, r0    ; 63        1
     b6:    0f 90           pop    r0                      2
     b8:    1f 90           pop    r1                      2
     ba:    18 95           reti                        4
                                            Total       37 cycles

Simple code to toggle video out as we spin waiting for timer to expire.  This was one of my first test patterns.
            VIDEO_port_temp = VIDEO_port & ~(VIDEO_high | VIDEO_low);
            VIDEO_port = VIDEO_port_temp | VIDEO_low; // set black: 0.3v

            VIDEO_port_temp = VIDEO_port & ~(VIDEO_high | VIDEO_low);
            VIDEO_port = VIDEO_port_temp | (VIDEO_high | VIDEO_low); // set white: 1.0v