Skip to content

5.final project image recognition

Image Recognition system (Raspberry Pi Camera)

Overview

This is the final version of the Image Recognition System within the FabAcademy period👍

Created the image recognition program part in week11.

You can see it at 0:00 - 0:07 in the following video.

Design with Fusion360

Designed the following

Made of 2.5mm MDF.

Electric Circuit

Connect a tact switch to GPIO 10 of the Raspberry Pi.

And connect the Raspberry Pi camera to the Raspberry Pi with a ribbon cable.

Machine Learning

I used teachable machine for machine learning

Teachable Machine

teachable machine allows you to have machine learning on your app and use models in python or javaScript

Select Get Started => Image Project

In this case, select the standard image model (224px x 224px, color image) to output in sensorflowlite format.

The following screen appear

Add the images to have recognized to each class.

This time, recognized the following things

Imported each images.

What I did this time was to recognize images of the following trash separation and color coding of PET bottle caps

Sorting Trash

Color-coding of plastic bottle caps

Name the class as follows and let it machine learn.

Export in tensorflow lite format

The trained model can identify the following things

Trash Separation

( Trash that can be turned into charcoal / Trash that cannot be turned into charcoal / plastics )

Color-coding of plastic bottle caps

( White / Blue / Red )

Programming

Referred by this video

# Referenced by https://www.youtube.com/watch?v=EY3OVoh-014

import time
import tensorflow as tf
import numpy as np
import cv2
import RPi.GPIO as GPIO
import requests as req
from imutils.video.pivideostream import PiVideoStream

GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(10, GPIO.IN, pull_up_down=GPIO.PUD_DOWN)

interpreter = tf.lite.Interpreter(model_path="model_unquant.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

target_height = input_details[0]["shape"][1]
target_width = input_details[0]["shape"][2]

f = open("labels.txt", "r")
lines = f.readlines()
f.close()
classes = {}
for line in lines:
    pair = line.strip().split(maxsplit=1)
    classes[int(pair[0])] = pair[1].strip()

def detect(frame):

    resized = cv2.resize(frame, (target_width, target_height))
    input_data = np.expand_dims(resized, axis=0)
    input_data = (np.float32(input_data) - 127.5) / 127.5
    interpreter.set_tensor(input_details[0]["index"], input_data)

    interpreter.invoke()
    detection = interpreter.get_tensor(output_details[0]["index"])
    return detection

def draw_detection(frame, detection):
    for i, s in enumerate(detection[0]):
        tag = f"{classes[i]}: {s*100:.2f}%"
        cv2.putText(frame, tag, (10, 20 + 20 * i),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)
    return frame

def main():
    camera = PiVideoStream(resolution=(512, 400)).start()
    time.sleep(2)

    while True:
        frame = camera.read()
        detection = detect(frame)
        value = classes[detection.tolist()[0].index(
            max(detection.tolist()[0]))]
        drawn = draw_detection(frame, detection)
        cv2.imshow("frame", drawn)
        if GPIO.input(10) == GPIO.HIGH:
            request = 'ESP32_IP_Address' + '/' + value
            response = req.get(request)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    camera.stop()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Explanation of Codes

First imported the following library modules

import time
import tensorflow as tf
import numpy as np
import cv2
import RPi.GPIO as GPIO
import requests as req
from imutils.video.pivideostream import PiVideoStream

Module for handling time.

import time

Import tensorflow

library for use in machine learning

import tensorflow as tf

Library for fast numerical calculation

Import numpy as np

Import OpenCV

Library for processing images and videos

import cv2

Library for controlling RaspberryPi with Python

import RPi.GPIO as GPIO

Python HTTP communication library.

import requests as req

Library for both PiCamera modules and USB cameras

from imutils.video.pivideostream import PiVideoStream

Use GPIO.setwarnings(False) to disable warnings.

GPIO.setwarnings(False)

This is for GPIO pin numbering setup

GPIO.setmode(GPIO.BOARD)

Set pin 10 to be an input pin and set initial value to be pulled low

GPIO.setup(10, GPIO.IN, pull_up_down=GPIO.PUD_DOWN)

Load a TFLite model

interpreter = tf.lite.Interpreter(model_path="model_unquant.tflite")

Memory allocation. This is required immediately after model loading.

interpreter.allocate_tensors()

Get the properties of the input and output layers of the training model.

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

Obtaining the tensor data configuration of the input layer

target_height = input_details[0]["shape"][1]
target_width = input_details[0]["shape"][2]

The variable f contains data for reading and writing files. This is called a file object.

Assign the contents of labels.txt as read-only data to variable f.

f = open("labels.txt", "r")

readlines() reads the entire contents of the file, line by line, into a list

lines = f.readlines()

When a machine learning model is exported from teachable machine, labels.tet is exported at the same time.

The content of labels.txt is as follow

0 charcol
1 notcharcol
2 plastic

As a rule, when you are finished using a file object, you must call the close() method to close (close) it.

If you leave the file without calling close(), it will be recognized that the file is still in use. This will inhibit other programs from attempting to use the same file.

f.close()

Assign dictionary to the variable class

classes = {}

Each of the three is added to classes as {key: value} in labels.txt

0 charcol
1 notcharcol
2 plastic
for line in lines:
    pair = line.strip().split(maxsplit=1)
    classes[int(pair[0])] = pair[1].strip()

It goes as follows

classes = {
  0: charcol
  1: notcharcol
  2: plastic
}

Explain inside the main function.

def main():
    camera = PiVideoStream(resolution=(512, 400)).start()
    time.sleep(2)

    while True:
        frame = camera.read()
        detection = detect(frame)
        value = classes[detection.tolist()[0].index(
            max(detection.tolist()[0]))]
        drawn = draw_detection(frame, detection)
        cv2.imshow("frame", drawn)
        if GPIO.input(10) == GPIO.HIGH:
            request = 'ESP32_IP_Address' + '/' + value
            response = req.get(request)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    camera.stop()
    cv2.destroyAllWindows()

Open camera stream

PiVideoStream is a module for PiCamera

camera = PiVideoStream(resolution=(512, 400)).start()
time.sleep(2)

Inside while True: process is looped

while True:
    frame = camera.read()
    detection = detect(frame)
    value = classes[detection.tolist()[0].index(
        max(detection.tolist()[0]))]
    drawn = draw_detection(frame, detection)
    cv2.imshow("frame", drawn)
    if GPIO.input(10) == GPIO.HIGH:
        request = 'ESP32_IP_Address' + '/' + value
        response = req.get(request)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

camera.stop()
cv2.destroyAllWindows()

read() is used to retrieve a single photo from the frame.

frame = camera.read()
detection = detect(frame)

The contents of detect() are as follows

def detect(frame):

    resized = cv2.resize(frame, (target_width, target_height))
    input_data = np.expand_dims(resized, axis=0)
    input_data = (np.float32(input_data) - 127.5) / 127.5
    interpreter.set_tensor(input_details[0]["index"], input_data)

    interpreter.invoke()
    detection = interpreter.get_tensor(output_details[0]["index"])
    return detection

this will resize the image to have target_width (width) and target_height (height):

resized = cv2.resize(frame, (target_width, target_height))

np.expand_dims() adds a new dimension of size 1

input_data = np.expand_dims(resized, axis=0)

Each RGB 0 ~ 255 pixel value should fall in the range of -1 to 1

input_data = (np.float32(input_data) - 127.5) / 127.5

Set pointer to tensor data in index

interpreter.set_tensor(input_details[0]["index"], input_data)

Predicts classification results

interpreter.invoke()

Inference results are stored in the index of output_details

detection = interpreter.get_tensor(output_details[0]["index"])

Returns the value of detection

return detection

Pick the highest value in classes.

value = classes[detection.tolist()[0].index(
        max(detection.tolist()[0]))]
classes = {
  0: charcol
  1: notcharcol
  2: plastic
}

For example If charcol 90% , charcol 5% ,plastic 5% charcol is selected.

drawn = draw_detection(frame, detection)

The contents of detect() are as follows

def draw_detection(frame, detection):
  for i, s in enumerate(detection[0]):
      tag = f"{classes[i]}: {s*100:.2f}%"
      cv2.putText(frame, tag, (10, 20 + 20 * i),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)
  return frame

Retrieve index numbers and scores one by one

  for i, s in enumerate(detection[0]):

Use the percentages in 100 such as charcol: 90%

tag = f"{classes[i]}: {s*100:.2f}%"

cv2.putText(img, text, org, fontFace, fontScale, color, thickness)

img: OpenCV image

text: text

org: The coordinates of the lower left corner of the text in (x, y)

fontFace: Font. Only a few types can be specified, such as cv2.FONT_HERSHEY_SIMPLEX

color: Text color. (blue, green, red)

thickness: Line thickness. Optional, default value is 1

cv2.putText(frame, tag, (10, 20 + 20 * i),cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)

Returns the value of frame

return frame

Display the image on the window

The first argument is a string window name. The second argument is the image to be displayed. Multiple windows can be displayed as needed, but each window must have a different name.

cv2.imshow("frame", drawn)

When the button is pressed, HTTP GET request is sent by adding /value (charcol or notcharcol or plastic) to the ESP32 listening as webserver

if GPIO.input(10) == GPIO.HIGH:
    request = 'ESP32_IP_Address' + '/' + value
    response = req.get(request)

Exit while loop if q key is pressed

if cv2.waitKey(1) & 0xFF == ord("q"):
    break

Only when the module is executed directly, the code to be executed is written in if block

if __name__ == "__main__":
    main()

Packaging

The upper section can be bent by kerf bending as shown below.

Insert into the calf bending gap in the column supporting the sorter.

Insert it into the kerf bending gap of the table on which the sorter is placed.

The ribbon cable connecting the camera to the Raspberry Pi is threaded through the camera stand.

I attached the button by drilling hole in the MDF calf bending as shown below.

Drilled the following holes to connect to the Raspberry Pi connectors


Last update: June 30, 2022