5. AI Vision Projects
5.1 Single Color Recognition
In this section, the camera detects colors. When a red ball is recognized, the buzzer will emit a beep, and the red ball will be highlighted in the transmitted image with “Color: red” displayed.
5.1.1 Program Description
The implementation of color recognition consists of two parts: color detection and execution feedback after recognition.
First, for the color detection part, Gaussian filtering is applied to the image to reduce noise. The Lab color space is then used to convert the color of the object (you can learn more about the Lab color space in the “OpenCV Vision Basic Course” section of the tutorial materials).
Next, the object’s color within the circle is recognized using color thresholding, followed by masking (masking involves using selected images, shapes, or objects to globally or locally obscure the image being processed).
After performing morphological operations such as opening and closing on the object image, the object with the largest contour is circled.
Opening: The image undergoes erosion followed by dilation. This operation removes small objects, smooths shape boundaries, and preserves the area. It can eliminate small noise particles and separate connected objects.
Closing: The image undergoes dilation followed by erosion. This operation fills small holes within objects, connects nearby objects, closes broken contour lines, and smooths boundaries while preserving the area.
After recognition, the servo and buzzer are set up to provide feedback based on the detected color. For example, when red is detected, the buzzer will emit a sound.
For detailed feedback behavior, please refer to section 5.1.3 Program Outcome of this document.
5.1.2 Start and Close the Game
Note
The input command is case-sensitive, and keywords can be auto-completed using the Tab key.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Click the icon
in the top left corner of the system desktop or press the shortcut “Ctrl+Alt+T” to open the LX terminal.
(3) Execute the command to navigate to the directory where the program is located, then press Enter:
cd TonyPi/Functions/
(4) Enter the command and press Enter to start the program:
python3 Color_Warning.py
(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.
5.1.3 Program Outcome
After starting the game, the camera will be used to detect colors. When a red ball is recognized, the buzzer will emit a beep sound, and the ball will be circled in the transmitted image, with “Color: red” printed.
Note
During the recognition process, ensure the environment is well-lit to avoid inaccurate recognition due to poor lighting conditions.
Ensure that no objects with similar or matching colors to the target are present in the background within the camera’s visual range, as this may cause misrecognition.
5.1.4 Program Analysis
The source code of this program is saved in: /home/pi/TonyPi/Functions/Color_Warning.py
Import Function Library
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | #!/usr/bin/python3 #coding=utf8 import sys import os import cv2 import math import time import threading import numpy as np import hiwonder.Camera as Camera import hiwonder.Misc as Misc import hiwonder.ros_robot_controller_sdk as rrc import hiwonder.yaml_handle as yaml_handle |
(1) Import Libraries for OpenCV, Time, Math, and Threading to use functions from a library, we can call them with the syntax:
library_name.function_name(parameter1, parameter2, ...)
To use functions from a library, we can call them with the syntax: library_name.function_name(parameter1, parameter2, …)
168 | time.sleep(0.01) |
For example, to call the sleep function from the time library, we use:
In Python, several libraries like time, cv2, and math are built-in and can be directly imported and used. You can also create your own libraries, like the yaml_handle file-reading library mentioned above.
(2) Instantiate a Library
Some library names can be long and hard to remember. To simplify function calls, we often instantiate libraries. For example:
13 | import hiwonder.ros_robot_controller_sdk as rrc |
After instantiating the library, we can call functions from the Board library using the shorter syntax:
Board.function_name(parameter1, parameter2, ...)
This makes it much easier and more convenient to use.
Main Function Analysis
In a Python program, __name__ == '__main__' indicates the main function of the program, where the program starts by reading an image.
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | if __name__ == '__main__': from CameraCalibration.CalibrationConfig import * #加载参数(load parameters) param_data = np.load(calibration_param_path + '.npz') #获取参数(get parameters) mtx = param_data['mtx_array'] dist = param_data['dist_array'] newcameramtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (640, 480), 0, (640, 480)) mapx, mapy = cv2.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (640, 480), 5) open_once = yaml_handle.get_yaml_data('/boot/camera_setting.yaml')['open_once'] if open_once: my_camera = cv2.VideoCapture('http://127.0.0.1:8080/?action=stream?dummy=param.mjpg') else: my_camera = Camera.Camera() my_camera.camera_open() print("Color_Warning Init") print("Color_Warning Start") while True: ret, img = my_camera.read() if img is not None: frame = img.copy() frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR) # 畸变矫正(distortion correction) Frame = run(frame) cv2.imshow('Frame', Frame) key = cv2.waitKey(1) if key == 27: break else: time.sleep(0.01) my_camera.camera_close() cv2.destroyAllWindows() |
(1) Image Processing
① Function run() for Image Processing.
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | def run(img): global draw_color global color_list global detect_color img_copy = img.copy() img_h, img_w = img.shape[:2] frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB) # 将图像转换到LAB空间(convert image to the LAB space) max_area = 0 color_area_max = None areaMaxContour_max = 0 for i in lab_data: if i != 'black' and i != 'white': frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) #对原图像和掩模进行位运算(operate bitwise operation to original image and mask) |
Resizing the Image. The image size is resized to facilitate processing.
85 | frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) |
The first parameter img_copy is the input image.
The second parameter size specifies the output image size, which can be customized.
The third parameter interpolation=cv2.INTER_NEAREST defines the interpolation method.
INTER_NEAREST: Nearest-neighbor interpolation.
INTER_LINEAR: Bilinear interpolation (default if not specified).
INTER_CUBIC: Bicubic interpolation over a 4x4 pixel neighborhood.
INTER_LANCZOS4: Lanczos interpolation over an 8x8 pixel neighborhood.
Convert the Image to LAB Color Space. The
cv2.cvtColor()function is used for color space conversion.
86 | frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) |
The first parameter "frame_resize" is the image to be converted.
The second parameter cv2.COLOR_BGR2LAB converts the image from BGR format to LAB format. To convert to RGB, use cv2.COLOR_BGR2RGB.
Convert the Image to a Binary Image
The image is simplified by converting it to a binary image, containing only 0s and 1s, which reduces the data size and makes it easier to process. The cv2.inRange() function is used for thresholding.
95 96 97 98 99 100 101 | frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) #对原图像和掩模进行位运算(operate bitwise operation to original image and mask) |
The first parameter "frame_lab" is the input image.
The second parameter (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]) specifies the lower color threshold.
The third parameter (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2]) specifies the upper color threshold.
Apply Morphological Operations (Opening and Closing)
To reduce interference and smooth the image, morphological operations are applied. Opening is erosion followed by dilation, and closing is dilation followed by erosion. The cv2.morphologyEx() function is used.
102 103 | eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #腐蚀(corrosion) dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation) |
The first parameter "frame_mask" is the input image.
The second parameter cv2.MORPH_OPEN specifies the morphological operation (options include cv2.MORPH_ERODE, cv2.MORPH_DILATE, cv2.MORPH_OPEN, cv2.MORPH_CLOSE).
The third parameter np.ones((6, 6)) specifies the convolution kernel.
The fourth parameter np.uint8 defines the number of iterations to apply.
Find the Largest Contour
After completing the image processing, the largest contour is found using the cv2.findContours() function.
104 | contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] #找出轮廓(find out contour) |
The first parameter "closed" is the input image.
The second parameter cv2.RETR_EXTERNAL specifies the contour retrieval mode.
The third parameter cv2.CHAIN_APPROX_NONE)[-2] specifies the contour approximation method.
The largest contour is selected, and a minimum area threshold is set to ensure the target contour is valid only if its area exceeds this value.
106 107 108 109 110 | if areaMaxContour is not None: if area_max > max_area:#找最大面积(find out the maximal area) max_area = area_max color_area_max = i areaMaxContour_max = areaMaxContour |
Display the Result
The detected object is circled in the transmitted image, and the detect color is printed.
132 | cv2.putText(img, "Color: " + detect_color, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, draw_color, 2) |
Display the Transmitted Image
157 158 159 160 161 162 163 164 165 166 167 168 169 170 | while True: ret, img = my_camera.read() if img is not None: frame = img.copy() frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR) # 畸变矫正(distortion correction) Frame = run(frame) cv2.imshow('Frame', Frame) key = cv2.waitKey(1) if key == 27: break else: time.sleep(0.01) my_camera.camera_close() cv2.destroyAllWindows() |
The function cv2.imshow() is used to display an image in a window. The first parameter frame is the name of the window, and the second parameter Frame is the content to be displayed.
It is important to include cv2.waitKey() after cv2.imshow(), as the image will not be displayed without it.
The function cv2.waitKey() waits for a key press, and the parameter 1 specifies the delay time in milliseconds.
5.1.5 Function Extension
Adjusting Color Thresholds
If the color recognition performance is poor during the game experience, it may be necessary to adjust the color threshold. This section uses red as an example, and the same method can be applied to adjust other colors. Follow the steps below:
(1) Double-click
, and in the popup interface, click “Execute”.
(2) Once in the interface, click “Connect” to link the camera.
(3) After a successful connection, select “red” from the color options in the lower-right corner of the interface.
Note
If the transmitted image does not appear in the popup window, the camera may not have connected successfully. Check that the camera’s connection cable is properly plugged in.
(4) In the interface shown below, the right side displays the real-time transmitted image, while the left side shows the color to be detected. Point the camera at the red ball, then adjust the six sliders at the bottom so that the red ball area on the left turns entirely white, and the other areas turn black. Afterward, click the “Save” button to save the settings.
5.2 Color Recognition
The robot recognizes colors and provides feedback on the recognition result through “nodding” or “shaking” its head.
5.2.1 Program Description
The following is the overall process:
First, program TonyPi to recognize colors with Lab color space. You can go to “OpenCV Vision Basic Course” for detailed learning the Lab color space.
Second, identify the object color in the circle using color threshold value, then apply a mask to that part of the image. Masking is the process of using selected images, graphics,
After processing the corrosion and inflation of the object image, the largest object contour is circled.
Corrosion: By iterating through each pixel of the image, check its overlap with the surrounding structural element. If all the overlapping pixel values are 1, then keep the original pixel value unchanged; otherwise, set it to 0. Mainly used to eliminate unimportant edge information in the image, reducing the area of the image.
Inflation: Similar to the inverse process of erosion. This process involves convolving the image with a structural element, calculate the maximum pixel value within the covered area, and assign this maximum value to the pixel specified by the reference point. The inflation expands the highlighted areas in an image gradually, typically used to fill holes or gaps in the image.
Next, judge the recognized color. If the sett color is detected the head servo will be turned up and down, otherwise it will be turned left and right.
5.2.2 Start and Close the Game
Note
Pay attention to the text format in the input of instructions.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Double-click “Terminator” icon
in the Raspberry Pi desktop and open command line.
(3) Input and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(4) Input command, then press Enter to start the game.
python3 ColorDetect.py
(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.2.3 Project Outcome
Note
The program defaults to recognizing the color red. To switch to blue or green, refer to “5.2.5 Function Extension->Modify Default Recognition Color”.
Place the red ball in front of the TonyPi. The robot will “nod” upon recognition. Place the blue and green balls in front of the TonyPi. The robot will “shake its head” upon recognition.
5.2.4 Program Analysis
The source code of this program is locate in /home/pi/TonyPi/Functions/ColorDetect.py
Import Parameter Module
| Import module | function |
|---|---|
| import sys | The Python "sys" module has been imported for accessing system-related functions and variables. |
| import os | The Python "os" module has been imported, providing functions and methods for interacting with the operating system. |
| import cv2 | The OpenCV library has been imported for image processing and computer vision-related functionalities. |
| import time | The Python "time" module has been imported for time-related functionalities, such as delay operations. |
| import math | The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants. |
| import threading | Provides an environment for running multiple threads concurrently. |
| import np | The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations. |
| import sensor.camera as camera | Import camera library |
| from common import misc | The "Misc" module has been imported for handling recognized rectangular data. |
| import common.ros_robot_controller_sdk as rrc | The robot's low-level control library has been imported for controlling servos, motors, RGB lights, and other hardware. |
| import common.yaml_handle | Contains functionalities or tools related to processing YAML format files. |
| from common.controller import Controller | Import action group execution library |
Function Logic
Capture image information through the camera, then process the image, specifically by performing binarization. At the same time, to reduce interference and make the image smoother, perform erosion and dilation operations on the image.
Next, obtain the largest area contour and minimum enclosing circle of the target, determine the color of the color block and provide corresponding feedback.
Program Logic and Related Code Analysis
Based on the above diagram, the program’s logical flow mainly consists of image processing and color tracking. The following document will be written in accordance with the program logic.
(1) Import function library
In this initialization step, the first task is to import the required libraries for subsequent program calls. For details on the imports, refer to 5.2.4 Program Analysis->Import parameter module.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #!/usr/bin/python3 # coding=utf8 import sys import os import cv2 import math import time import threading import numpy as np import hiwonder.Camera as Camera import hiwonder.Misc as Misc import hiwonder.ros_robot_controller_sdk as rrc from hiwonder.Controller import Controller import hiwonder.ActionGroupControl as AGC import hiwonder.yaml_handle as yaml_handle |
(2) Set initial state
Set initial state, including the initial position of servo, PID, color threshold value, etc.
76 77 78 | def initMove(): ctl.set_pwm_servo_pulse(1, 1500, 500) ctl.set_pwm_servo_pulse(2, servo_data['servo2'], 500) |
(3) Image pre-processing
Resizing and Gaussian blur processing of the image.
194 195 | frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) |
cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) is an operation to resize the image.
The first parameter img_copy is the image to be resized.
The second parameter size is the target size.
The third parameter interpolation is the interpolation method, which is used to determine the pixel interpolation algorithm used for resizing.
cv2.GaussianBlur(frame_resize, (3, 3), 3) applies Gaussian blur to the image.
The first parameter frame_resize is the image to be blurred.
The second parameter (3, 3) is the size of the Gaussian kernel, indicating that the width and height of the kernel are both 3.
The third parameter 3 is the standard deviation of the Gaussian kernel, used to control the degree of blur.
(4) Color space conversion
Convert the BGR image to LAB image.
196 | frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB) |
(5) Binarization processing
Use inRange() function in cv2 library to process binarization.
203 204 205 206 207 208 209 210 211 | for i in lab_data: if i != 'black' and i != 'white': frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) |
The first parameter frame_lab is inputting image.
The second parameter lab_data[i]['min'][0] is the lower limit of the threshold.
The third parameter lab_data[i]['max'][0] is the upper limit of the threshold.
(6) Corrosion and inflation
212 213 | eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) |
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) is the operation to perform corrosion on the binary image.
The first parameter frame_mask is the binary image on which morphological operations are to be performed.
The second parameter cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) is the structuring element for the corrosion operation. A rectangular structuring element of size (3, 3) is used here.
The dilation function follows the same principle.
(7) Get the contour with the largest area
After completing the above image processing, it is necessary to obtain the contours of the recognized targets. This involves using the “findContours()” function from the cv2 library.
216 | contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] |
Take code contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] as example:
The first parameter dilated is inputting image.
The second parameter cv2.RETR_EXTERNAL is the contour retrieval mode.
The third parameter cv2.CHAIN_APPROX_NONE)[-2] is the contour approximation method.
Find the contour with the largest area in the obtained contour. In order to avoid interference, you need to set a minimum value. The target contour is considered valid only if its area is greater than this value.
217 218 219 220 221 222 | areaMaxContour, area_max = getAreaMaxContour(contours) if areaMaxContour is not None: if area_max > max_area: max_area = area_max color_area_max = i areaMaxContour_max = areaMaxContour |
(8) Determine the largest color block
Determine the color of the largest area contour and add the result to the color_list.
230 231 232 233 234 235 236 237 238 | if color_area_max == 'red': color = 1 elif color_area_max == 'green': color = 2 elif color_area_max == 'blue': color = 3 else: color = 0 color_list.append(color) |
(9) Multiple judgments
Take the average by multiple judgments, and determine the recognized color.
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 | if len(color_list) == 3: color = int(round(np.mean(np.array(color_list)))) color_list = [] if color == 1: detect_color = 'red' draw_color = range_rgb["red"] elif color == 2: detect_color = 'green' draw_color = range_rgb["green"] elif color == 3: detect_color = 'blue' draw_color = range_rgb["blue"] else: detect_color = 'None' draw_color = range_rgb["black"] else: detect_color = 'None' draw_color = range_rgb["black"] |
(10) Print recognized outcome
Use the cv2.putText() function from the cv2 library to draw text on the image.
260 | cv2.putText(img, "Color: " + detect_color, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, draw_color, 2) |
Take code cv2.putText(img, "Color: " + detect_color, (10, img.shape\[0\] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, draw_color, 2) as example:
The first parameter img is the image being drawn.
The second parameter 'Color: ' + detect_color is the information drawn on the image.
The third parameter (10, img.shape[0] - 10) is the starting coordinate of the text, i.e., the position of the bottom-left corner of the text. Here, the text is 10 pixels away from the left and bottom edges of the image, respectively.
The fourth parameter cv2.FONT_HERSHEY_SIMPLEX is the font type.
The fifth parameter 0.65 is the size scaling factor for the text.
The sixth parameter draw_color is the color of the text.
The seventh parameter 2 is the thickness of the text.
(11) Color recognition
① After recognizing the red ball, control robot servo 1 to make the robot nod twice continuously, then return to the neutral position as pictured:
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | if __isRunning: if detect_color = 'None': action_finish = False if detect_color = 'red': ctl.set_pwm_servo_pulse(1, 1800, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(1, 1200, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(1, 1800, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(1, 1200, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(1, 1500,1200) time.sleep(0.1) |
Take code ctl.set_pwm_servo_pulse(1, 1800, 200) as example:
The first parameter 1 indicates the servo ID being controlled.
The second parameter 1800 represents the pulse width for servo ID 1. 1500 controls the servo to return to the neutral position.
The third parameter 200 represents the servo’s movement time, which is 200 milliseconds.
② After recognizing the green or blue ball, control robot servo 2 to make the robot shake its head twice continuously, then return to the neutral position, as shown in the following figure.
153 154 155 156 157 158 159 160 161 162 163 164 165 166 | elif detect_color = 'green' or detect_color = 'blue': ctl.set_pwm_servo_pulse(2, 1800, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(2, 1200, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(2, 1800, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(2, 1200, 200) time.sleep(0.2) ctl.set_pwm_servo_pulse(2, 1500, 100) time.sleep(0.1) detect_color = 'None' draw_color = range_rgb["black"] time.sleep(1) |
5.2.5 Function Extension
Modify Default Recognition Color
Red, green and blue are the built-in colors in the color recognition program and the red is the default color. Then the robot will perform “nod”.
In the following steps, we’re going to modify the recognized color as green.
(1) Enter command to the directory where the game program is located.
cd TonyPi/Functions
(2) Enter command to go into the game program through vim editor.
vim ColorDetect.py
(3) Find codes if detect_color == 'red': and elif detect_color == 'green' or detect_color == 'blue':.
Note
After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.
(4) Press”i” to enter the editing mode, then modify red in (if detect_color == ‘red’) to green. And modify red in line 120(elif detect_color== ‘green’ or detect_color == ‘blue’) to green. If you want to recognize blue, please revise to “blue”.
if detect_color == 'red':
ctl.set_pwm_servo_pulse(1, 1800, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1200, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1800, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1200, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1500, 100)
time.sleep(0.1)
(5) Press “Esc” to enter last line command mode. Input :wq to save the file and exit the editor.
:wq
Add Recognized Color
In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take orange as example:
(1) Open VNC, input command to open Lab color setting document.
vim TonyPi/lab_config.yaml
It is recommended to use screenshot to record the initial value.
(2) Click the debugging tool icon in the system desktop. Choose “Run” in the pop-up window.
(3) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.
(4) Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other areas become black.
For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the orange part of the left screen becomes white and other colors become black, and then click “Save” button to keep the modified data.
(5) After the modification is completed, check whether the modified data was successfully written in. Enter the command again vim TonyPi/lab_config.yaml to check the color setting parameters.
vim TonyPi/lab_config.yaml
For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.
(6) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter “:wq” to save it and exit.
(7) The default recognized color can be set as red according to the 5.2.5 Function Extension -> Modify Default Recognition Color in this text.
(8) Start the game again and put the orange ball in front of the camera. TonyPi will perform “nod”.
5.3 Target Position Recognition
In this lesson, the camera will be used to recognize red, green, and blue balls. The detected balls will be highlighted in the live feed, and their XY coordinates will be displayed.
5.3.1 Program Description
The implementation of target tracking can be divided into two parts: color recognition and position marking.
First, for the color recognition part, Gaussian filtering is applied to the image for noise reduction. The Lab color space is then used to convert the color of the objects (for more details on the Lab color space, please refer to the “OpenCV Vision Basic Course”).
Next, color thresholding is used to identify the color of objects within the circle. The image is then masked (masking involves using a selected image, shape, or object to globally or locally occlude the processed image).
After performing morphological operations (open and close operations) on the object’s image, the largest contour is outlined with a circle.
Opening operation: The image is eroded first and then dilated. This operation is used to remove small objects, smooth shape boundaries, and preserve the overall area. It helps remove small noise particles and separate objects that are connected.
Closing operation: The image is dilated first and then eroded. This operation is used to fill small holes within the objects, connect adjacent objects, and reconnect broken contour lines while smoothing the boundaries without changing the area.
Position marking requires specific detection algorithms. The basic principle is to search for areas in the image that match predefined features or patterns, then return the position and bounding box of these areas.
5.3.2 Start and Close the Game
Note
The input of commands must strictly distinguish between uppercase and lowercase letters, as well as spaces. Additionally, you can use the “Tab” key on the keyboard to auto-complete keywords.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Double-click “Terminator” icon
in the Raspberry Pi desktop and open command line.
(3) Input and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(4) Input command, then press Enter to start the game.
python3 ColorPositionRecognition.py
(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.3.3 Program Outcome
The program defaults to recognizing red, green, and blue balls. After recognition, it will highlight the objects in the transmitted image and display their XY coordinates.
Note
During the recognition process, ensure the environment is well-lit to avoid inaccurate recognition due to lighting issues.
Ensure there are no objects with similar or identical colors to the target colors within the camera’s field of view to prevent misrecognition.
If color recognition is inaccurate, refer to the section “5.3.5 Function Extension\Adjusting Color Threshold” in this document to adjust the color threshold settings.
5.3.4 Program Analysis
The source code of this program is locate in /home/pi/TonyPi/Functions/ColorPositionRecognition.py
Importing Libraries
1 2 3 4 5 6 7 8 9 10 11 12 | #!/usr/bin/python3 # coding=utf8 import sys import os import cv2 import math import time import numpy as np import hiwonder.Camera as Camera import hiwonder.Misc as Misc import hiwonder.yaml_handle as yaml_handle |
(1) Import the necessary libraries, including OpenCV, time, math, threading, and inverse kinematics. To call a function from a library, use the format LibraryName.FunctionName(Parameters). For example:
168 | time.sleep(0.01) |
This calls the sleep function from the time library, which is used for adding delays.
Python comes with several built-in libraries like time, cv2, math, which can be imported directly. You can also create your own libraries, such as the “yaml_handle” file reading library.
(2) Instantiating Libraries
Sometimes, library names are long and hard to remember. To make function calls more convenient, we often instantiate libraries using shorter names. For example:
14 | import hiwonder.Misc as Misc |
After instantiation, functions from the Board library can be called as:
Board.FunctionName(Parameters)
This makes calling functions much easier.
Main Function Analysis
In a Python program, the if __name__ == '__main__': block indicates the main function. The program starts by opening the camera and reading the video stream. The read() method captures each frame of the image, where the program searches for and marks the color of the ball, then displays the result. The video is displayed through a loop, and once the display is finished, the release() function is called to release the resources.
134 135 136 137 138 139 140 141 142 143 | if __name__ == '__main__': from CameraCalibration.CalibrationConfig import * from hiwonder.ros_robot_controller_sdk import Board board = Board() #加载参数 param_data = np.load(calibration_param_path + '.npz') #获取参数 mtx = param_data['mtx_array'] dist = param_data['dist_array'] |
(1) Capturing Camera Image
149 | my_camera = cv2.VideoCapture('http://127.0.0.1:8080/?action=stream?dummy=param.mjpg') |
When the program starts, the camera is initialized.
(2) Image Processing
① The run() function handles image processing.
75 76 77 78 79 80 81 82 83 84 | def run(img): global draw_color global color_list global detect_color img_copy = img.copy() img_h, img_w = img.shape[:2] frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) |
② Resize the image to make it easier to process.
83 | frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) |
The first parameter img_copy is the input image.
The second parameter size is the size of the output image, which can be set as needed.
The third parameter interpolation=cv2.INTER_NEAREST is the interpolation method. Options include:
INTER_NEAREST: Nearest-neighbor interpolation.
INTER_LINEAR: Bilinear interpolation (default if no other method is specified).
INTER_CUBIC: Bicubic interpolation in a 4x4 pixel neighborhood.
INTER_LANCZOS4: Lanczos interpolation in an 8x8 pixel neighborhood.
③ Apply Gaussian Blur to reduce noise
Gaussian blur is a linear smoothing filter used to eliminate Gaussian noise and is widely used in image denoising.
84 | frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) |
The first parameter frame_resize is the input image.
The second parameter (3, 3) is the size of the Gaussian kernel.
The third parameter 3 is the standard deviation of the Gaussian kernel in the X-direction
④ Convert the image to LAB color space.
85 | frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB) |
The first parameter frame_gb is the input image.
The second parameter cv2.COLOR_BGR2LAB specifies the conversion from BGR to LAB format. To convert to RGB, use cv2.COLOR_BGR2RGB.
⑤ Convert the image to a binary image with only 0s and 1s, simplifying the image and reducing data for easier processing.
The cv2.inRange() function is used for binarization:
93 94 95 96 97 98 99 | frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) #对原图像和掩模进行位运算 |
The first parameter frame_lab is the input image.
The second parameter (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]) is the lower threshold for the color.
The third parameter (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2]) is the upper threshold for the color.
⑥ Perform erosion and dilation to smooth the image and reduce interference.
Erosion reduces the size of foreground objects and eliminates small objects, while dilation increases the size of foreground objects and fills small holes.
100 101 | eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #腐蚀 dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀 |
⑦ Find the contour with the largest area
After the image processing steps, use the cv2.findContours() function to find contours:
102 | contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] #找出轮廓 |
The first parameter dilated is the input image.
The second parameter cv2.RETR_EXTERNAL specifies the contour retrieval mode.
The third parameter cv2.CHAIN_APPROX_NONE)[-2] specifies the contour approximation method.
The program searches for the largest contour and sets a threshold area to ensure the detected contour is valid.
102 103 104 105 106 107 108 109 110 | contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] #找出轮廓 areaMaxContour, area_max = getAreaMaxContour(contours) #找出最大轮廓 if areaMaxContour is not None: if area_max > max_area:#找最大面积 max_area = area_max color_area_max = i areaMaxContour_max = areaMaxContour if max_area > 200: # 有找到最大面积 |
⑧ Extract the position information
Use cv2.putText() to draw text on the image:
130 131 | cv2.putText(img, "Color: " + detect_color, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, draw_color, 2) cv2.putText(img, f"{(centerX, centerY)}", (centerX, centerY - 20), cv2.FONT_HERSHEY_SIMPLEX, 1.0, range_rgb[color_area_max], 2) |
The first parameter img is the input image.
The second parameter "Color: " + detect_color is the text to display (e.g., the detected color).
The third parameter (10, img.shape[0] - 10) and (centerX, centerY - 20) specify the starting coordinates for the text (bottom-left position).
The fourth parameter cv2.FONT_HERSHEY_SIMPLEX specifies the font type.
The fifth parameter 0.65 is the scaling factor for the font size.
The sixth parameter draw_color is the color of the text.
The seventh parameter 2 specifies the thickness of the text line.
(3) Displaying the Return Image
157 158 159 160 161 162 163 164 165 166 | while True: ret, img = my_camera.read() if img is not None: frame = img.copy() frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR) # 畸变矫正 Frame = run(frame) cv2.imshow('Frame', Frame) key = cv2.waitKey(1) if key == 27: break |
The cv2.imshow() function is used to display the image in a window. The first parameter is the window name (e.g., ‘Frame’), and the second parameter is the image to display.
The function cv2.waitKey() is used to wait for a key press; the parameter 1 specifies the delay time.
5.3.5 Function Extension
Adjusting Color Threshold
During the game experience, if the color recognition of objects is not accurate, you may need to adjust the color threshold. This section uses adjusting the red color as an example; the process for adjusting other colors is similar. Follow the steps below:
(1) Click the debugging tool icon in the system desktop. Choose “Execute” in the pop-up window.
(2) Once the interface opens, click “Connect”.
(3) After a successful connection, select “red” from the color options in the bottom-right corner of the interface.
(4) If the transmitted image does not appear in the pop-up window, it indicates the camera is not connected properly. Check the camera connection cable to ensure it is securely connected.
(5) The image on the right side of the interface shows the real-time transmitted video, and the left side shows the color to be captured.
Point the camera at the red color block, and then adjust the six sliders at the bottom to ensure that the red color block on the left side of the screen turns completely white, while other areas remain black. Finally, click the “Save” button to save the data.
Add Recognized Color
In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take orange as example:
(1) Open VNC, input command to open Lab color setting document.
vim TonyPi/lab_config.yaml
It is recommended to use screenshot to record the initial value.
(2) Click the debugging tool icon in the system desktop. Choose “Execute” in the pop-up window.
(3) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.
(4) Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other areas become black.
For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the orange part of the left screen becomes white and other colors become black, and then click “Save” button to keep the modified data.
(5) After the modification is completed, check whether the modified data was successfully written in. Enter the command again vim TonyPi/lab_config.yaml to check the color setting parameters.
vim TonyPi/lab_config.yaml
For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.
(6) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter “:wq” to save it and exit.
(7) Start the game again and put the orange ball in front of the camera.TonyPi displays the XY coordinates of the orange ball in the transmitted image.
5.4 Object Tracking
The robot recognizes colors, and its body can move according to the movement of the target color.
5.4.1 Program Description
First, program TonyPi to recognize colors with Lab color space. Convert the RGB color space to Lab, image binarization, and then perform operations such as expansion and corrosion to obtain an outline containing only the target color. Use circles to frame the color outline to realize object color recognition.
Next, the traversal algorithm compares all correctly recognized colored objects and selects the object with the largest contour area as the target.
Finally, the servo is called to perform real-time tracking, while the body is driven to perform follow-up actions through action groups, thus completing the object tracking function.
5.4.2 Start and Close the Game
Note
The input of commands must strictly distinguish between uppercase and lowercase letters, as well as spaces. Additionally, you can use the “Tab” key on the keyboard to auto-complete keywords.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Double-click “Terminator” icon
in the Raspberry Pi desktop and open command line
(3) Input and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(4) Input command, then press Enter to start the game.
python3 Follow.py
(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.4.3 Program Outcome
Note
The default recognized and tracking color is green. If you want to change to blue or red, please refer to 5.4.5 Function Extension -> Modify Default Recognition Color. Furthermore, when moving the handheld colored sponge blocks, the speed should not be too fast, and it should be within the range of camera recognition.
After the gameplay is started, slowly move the red sponge block by hand or place the block on a movable carrier. The TonyPi robot will move along with the movement of the target color.
5.4.4 Program Analysis
The source code of this program is locate in /home/pi/TonyPi/Functions/Follow.py
Color detection parameter
In the object tracking program, the detected object color is red.
287 288 289 290 291 292 293 294 295 296 297 298 | if __name__ == '__main__': init() start() __target_color = ('red') open_once = yaml_handle.get_yaml_data('/boot/camera_setting.yaml')['open_once'] if open_once: my_camera = cv2.VideoCapture('http://127.0.0.1:8080/?action=stream?dummy=param.mjpg') else: my_camera = Camera.Camera() my_camera.camera_open() AGC.runActionGroup('stand') |
The main detection parameters involved in the detection process are as follows:
(1) Before converting the image to the LAB color space, noise reduction processing is required. The GaussianBlur() function is used for Gaussian filtering as pictured:
206 | frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB) # 将图像转换到LAB空间(convert the image to LAB space) |
The first parameter frame_resize is inputting image.
The second parameter (3, 3) is the size of the Gaussian kernel. A larger kernel size typically results in a greater degree of filtering, making the output image more blurry, and it also increases computational complexity.
The third parameter 3 is the standard deviation of the Gaussian function along the X direction. In the Gaussian filter, it is used to control the variation near its mean. If this value is increased, the allowable range of variation around the mean is also increased; if decreased, the allowable range of variation around the mean is reduced.
(2) By using the “inRange” function to perform binaryzation on the input image as pictured:
213 214 215 216 217 218 219 | frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) #对原图像和掩模进行位运算(operate bitwise operation to original image and mask) |
(3) To reduce interference and make the image smoother, it is necessary to perform erosion and dilation operations on the image
220 221 | eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #腐蚀(corrosion) dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation) |
In the processing, the getStructuringElement function is used to generate structuring elements of different shapes.
The first parameter cv2.MORPH_RECT is the shape of the kernel, which is a rectangle in this case.
The second parameter (3, 3) is the size of the rectangle, which is 3x3 in this case.
(4) Find out the largest contour of the object
223 224 225 226 | areaMaxContour, area_max = getAreaMaxContour(contours) # 找出最大轮廓(find out the contour with the largest area) if areaMaxContour is not None and area_max > 100: # 有找到最大面积(the maximal area is found) rect = cv2.minAreaRect(areaMaxContour)#最小外接矩形(the minimum bounding rectangle) box = np.int0(cv2.boxPoints(rect))#最小外接矩形的四个顶点(the four vertices of the minimum bounding rectangle) |
To avoid interference, the if area_max_contour is not None and area_max > 100 instruction is used to ensure that only contours with an area greater than 100 are considered valid for the largest area.
Color recognition parameter
The main control parameters involved in the color recognition process are as follows:
(1) When the robot detects a colored object, use the “cv2.drawContours()” function to draw the contour of the colored object
231 | cv2.drawContours(img, [box], -1, (0,255,255), 2)#画出四个点组成的矩形(draw the rectangle formed by the four points) |
The first parameter img is inputting image.
The second parameter [box] is the contour itself, represented as a list in Python.
The third parameter -1 is the index of the contour, where the numerical value represents drawing all contours within the list.
The fourth parameter (0, 255, 255) is the contour color, with the order being B, G, R, and in this case, it represents yellow.
The fifth parameter 2 is the contour width. If set to -1, it means to fill the contour with the specified color.
(2) After the robot detects a colored object, use the cv2.circle() function to draw the center point of the colored object on the feedback screen.
237 | cv2.circle(img, (centerX, centerY), 5, (0, 255, 255), -1)#画出中心点(draw the center point) |
The first parameter img is the input image, which is the image of the detected colored object in this case.
The second parameter (centerX, centerY) is the coordinates of the center point of the circle to be drawn (determined based on the detected object).
The third parameter 5 is the radius of the circle to be drawn.
The fourth parameter (0, 255, 255) is the color of the circle to be drawn, with the order being B, G, R, and in this case, it represents yellow.
The fifth parameter -1 indicates that the circle should be filled with the color specified in parameter 4. If it is a number, it represents the line width of the circle to be drawn.
Perform motion parameter
(1) After detecting a red object, control servo 1 and servo 2 of the robot to move the upper camera with the movement of the red object.
268 269 270 271 272 273 274 275 276 277 278 279 280 281 | # 计算使用时间(calculate use time) use_time = round(max(use_time, abs(dy*0.00025)), 5) y_dis += dy # 将控制头部垂直移动的舵机位置限制在预设范围内(limit the position of the servo controlling vertical movement of the head within a predefined range) y_dis = servo_data['servo1'] if y_dis < servo_data['servo1'] else y_dis y_dis = 2000 if y_dis > 2000 else y_dis ctl.set_pwm_servo_pulse(1, y_dis, use_time*1000) ctl.set_pwm_servo_pulse(2, x_dis, use_time*1000) time.sleep(use_time) else: centerX, centerY = -1, -1 |
Take code ctl.set_pwm_servo_pulse(1, vertical_servo_position,use_time*1000) as example:
The first parameter 1 represents controlling servo ID 1.
The second parameter vertical_servo_position represents the pulse width of servo ID 1.
The third parameter use_time*1000 represents the movement time of the servo, in milliseconds.
(2) After detecting the red ball, the robot calls the action group file in the /home/pi/TonyPi/ActionGroups directory to control the robot to move along with the red object as pictured:
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | def move(): while True: if __isRunning: if centerX >= 0: if centerX - CENTER_X > 100 or x_dis - servo_data['servo2'] < -80: # 不在中心,根据方向让机器人转向一步(if not centered, instruct the robot to turn one step in the appropriate direction) AGC.runActionGroup('turn_right_small_step') elif centerX - CENTER_X < -100 or x_dis - servo_data['servo2'] > 80: AGC.runActionGroup('turn_left_small_step') elif 100 > circle_radius > 0: AGC.runActionGroup('go_forward') elif 180 < circle_radius: AGC.runActionGroup('back_fast') else: time.sleep(0.01) else: time.sleep(0.01) |
5.4.5 Function Extension
Modify Default Recognition Color
Red, green and blue are the built-in colors in the color recognition program and the red is the default color. Then the robot will perform “nod”.
In the following steps, we’re going to modify the recognized color as green.
(1) Enter command to the directory where the game program is located.
cd TonyPi/Functions
(2) Enter command to go into the game program through vim editor.
vim Follow.py
(3) Find codes “if detect_color == ‘red’:” and “elif detect_color == ‘green’ or detect_color == ‘blue’:”.
Note
After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.
(4) Press”i” to enter the editing mode, then modify red in (if detect_color == ‘red’) to green. And modify red in line 120 (elif detect_color== ‘green’ or detect_color == ‘blue’) to green. If you want to recognize blue, please revise to “blue”.
if detect_color == 'red':
ctl.set_pwm_servo_pulse(1, 1800, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1200, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1800, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1200, 200)
time.sleep(0.2)
ctl.set_pwm_servo_pulse(1, 1500, 100)
time.sleep(0.1)
(5) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.
:wq
Add Recognized Color
In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take orange as example:
(1) Open VNC, input command to open Lab color setting document.
vim TonyPi/lab_config.yaml
It is recommended to use screenshot to record the initial value.
(2) Click the debugging tool icon in the system desktop. Choose “Execute” in the pop-up window.
(3) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.
(4) Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other areas become black.
For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the orange part of the left screen becomes white and other colors become black, and then click “Save” button to keep the modified data.
(5) After the modification is completed, check whether the modified data was successfully written in. Enter the command again vim TonyPi/lab_config.yaml to check the color setting parameters.
vim TonyPi/lab_config.yaml
For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.
(6) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter “:wq” to save it and exit.
(7) Start the game again and put the orange ball in front of the camera. TonyPi tracks the orange ball in real time.
5.5 Auto Shooting
Note
please use the assorted balls for operation. If you have your own balls, we recommend using one with a diameter of 3cm.
Place the red ball in the area recognized by the robot’s camera. The robot will adjust its position according to the ball’s location, and then kick the ball away.
5.5.1 Program Description
Below are the details:
First, program TonyPi to recognize colors with Lab color space. You can go to “OpenCV Basic Lesson” for detailed learning of Lab color space.
Second, identify the object color in the circle using color threshold value, then apply a mask to that part of the image. Masking is the process of using selected images, graphics, or objects to globally or locally obscure parts of the processed image.
After the opening and closing operations on the object image, the largest object contour is circled.
Corrosion: By iterating through each pixel of the image, check its overlap with the surrounding structural element. If all the overlapping pixel values are 1, then keep the original pixel value unchanged; otherwise, set it to 0. Mainly used to eliminate unimportant edge information in the image, reducing the area of the image.
Inflation: Similar to the inverse process of erosion. This process involves convolving the image with a structural element, calculate the maximum pixel value within the covered area, and assign this maximum value to the pixel specified by the reference point. The inflation expands the highlighted areas in an image gradually, typically used to fill holes or gaps in the image.
Then, judge whether the object is in the central position after receiving the image feedback. If yes, call TonyPi to move forward to the target until it reaches the set range, and then execute the shooting action; otherwise, the robot will move left or right to the center of the target first
5.5.2 Start and Close the Game
Note
The input of commands must strictly distinguish between uppercase and lowercase letters, as well as spaces. Additionally, you can use the “Tab” key on the keyboard to auto-complete keywords.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Double-click “Terminator” icon
in the Raspberry Pi desktop and open command line.
(3) Input and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(4) Input command, then press Enter to start the game.
python3 KickBall.py
(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.5.3 Program Outcome
Note
Please use the robot and ball on the flat surface.
Place the red ball in front of the TonyPi. After recognition, the robot will adjust its position to close the ball and kick it forward.
5.5.4 Program Analysis
The source code of this program is locate in /home/pi/TonyPi/Functions/KickBall.py
Import Parameter Module
| Import module | function |
|---|---|
| import sys | The Python "sys" module has been imported for accessing system-related functions and variables. |
| import os | The Python "os" module has been imported, providing functions and methods for interacting with the operating system. |
| import cv2 | The OpenCV library has been imported for image processing and computer vision-related functionalities. |
| import time | The Python "time" module has been imported for time-related functionalities, such as delay operations. |
| import math | The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants. |
| import threading | Provides an environment for running multiple threads concurrently. |
| import np | The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations. |
| import sensor.camera as camera | Import camera library |
| from common import misc | The "Misc" module has been imported for handling recognized rectangular data. |
| import common.ros_robot_controller_sdk as rrc | The robot's low-level control library has been imported for controlling servos, motors, RGB lights, and other hardware. |
| import common.yaml_handle | Contains functionalities or tools related to processing YAML format files. |
| from common.controller import Controller | Import action group execution library |
Function Logic
Capture image information through the camera, then process the image,specifically by performing b inarization. At the same time, to reduce
interference and make the image smoother, perform erosion and dilation operations on the image.
Next, obtain the largest area contour and minimum enclosing circle of the target, retrieve the center point coordinates of the color block, and then call the action group to kick the ball.
Program Logic and Related Code Analysis
(1) Initialization
① Import function library
In this initialization step, the first task is to import the required libraries for subsequent program calls. For details on the imports, refer to 5.5.5 Program Analysis->Import parameter module .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/usr/bin/python3 # coding=utf8 import sys import os import cv2 import time import math import threading import numpy as np import hiwonder.PID as PID import hiwonder.Misc as Misc import hiwonder.Camera as Camera import hiwonder.ros_robot_controller_sdk as rrc from hiwonder.Controller import Controller import hiwonder.ActionGroupControl as AGC import hiwonder.yaml_handle as yaml_handle |
② Set initial state
Set initial state, including the initial position of servo, PID, color threshold value, etc.
120 121 122 | # 设置舵机位置(set the servo position) x_dis = servo_data['servo2'] y_dis = servo_data['servo1'] |
80 81 | # 设置需要检测的球的颜色,默认为红色(set the color of the ball to be detected, defaulting to red) __target_color = ('red') |
124 125 126 127 128 129 130 131 132 133 134 135 | # 初始化机器人上一步的状态(initialize the previous state of the robot) last_status = '' # 初始化开始计时的标志量(initialize the flag variable for starting the timer) start_count= True # 初始化球的中心坐标(initialize the center coordinates of the ball) CenterX, CenterY = -2, -2 # 初始化 PID 控制器(initialize PID controller) x_pid = PID.PID(P=0.145, I=0.00, D=0.0007) y_pid = PID.PID(P=0.145, I=0.00, D=0.0007) |
(2) Image processing
① Image pre-processing
Resizing and Gaussian blur processing of the image.
372 373 374 375 | # 重新调整图像大小(resize the image) frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) # 高斯模糊(Gaussian blur) frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) |
cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) is an operation to resize the image.
The first parameter img_copy is the image to be resized. The second parameter “size” is the target size.
The third parameter interpolation is the interpolation method, which is used to determine the pixel interpolation algorithm used for resizing.
cv2.GaussianBlur(frame_resize, (3, 3), 3) applies Gaussian blur to the image.
The first parameter frame_resize is the image to be blurred.
The second parameter (3, 3) is the size of the Gaussian kernel, indicating that the width and height of the kernel are both 3.
The third parameter 3 is the standard deviation of the Gaussian kernel, used to control the degree of blur.
② Color space conversion
Convert the BGR image to LAB image.
376 377 | # 将图像转换到LAB色彩空间(convert the image to LAB color space) frame_lab = cv2.cvtColor(frame_gb, cv2.COLOR_BGR2LAB) |
③ Binarization processing
Use inRange() function in cv2 library to process b inarization.
384 385 386 387 388 389 390 391 392 | if i in lab_data: #对原图像和掩模进行位运算(perform bitwise operation to original image and mask) frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) |
The first parameter frame_lab is inputting image.
The second parameter lab_data[i]['min'][0] is the lower limit of the threshold.
The third parameter lab_data[i]['max'][0] is the upper limit of the threshold.
④ Corrosion and inflation
393 394 | eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #腐蚀 dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀 |
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) is the operation to perform corrosion on the binary image.
The first parameter frame_mask is the binary image on which morphological operations are to be performed.
The second parameter cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) is the structuring element for the corrosion operation. A rectangular structuring element of size (3, 3) is used here.
The dilation function follows the same principle.
⑤ Get the contour with the largest area
After completing the above image processing, it is necessary to obtain the contours of the recognized targets. This involves using thefindContours()
function from the cv2 library.
399 | contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] # 找出轮廓(find out the contour) |
Take code contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] as example:
The first parameter dilated is inputting image.
The second parameter cv2.RETR_EXTERNAL is the contour retrieval mode.
The third parameter cv2.CHAIN_APPROX_NONE)[-2] is the contour approximation method.
Find the contour with the largest area in the obtained contour. In order to avoid interference, you need to set a minimum value. The target contour is considered valid only if its area is greater than this value.
401 402 403 404 405 406 | areaMaxContour, area_max = get_area_maxContour(contours) #找出最大轮廓 if areaMaxContour is not None: if area_max > max_area: #找最大面积 max_area = area_max color_area_max = i areaMaxContour_max = areaMaxContour |
⑥ Get color block center point coordinates
Using the misc function, map the x and y coordinates of the object center and the radius from the original size range to the range of the new image size (‘img_w’ and ‘ img_h’). And use the cv2.circle function to identify the color block by circling it.
408 409 410 411 | # 将球的中心坐标和半径映射回原始图像尺寸(map the center coordinates and radius of the ball back to the original image size) CenterX = int(Misc.map(CenterX, 0, size[0], 0, img_w)) CenterY = int(Misc.map(CenterY, 0, size[1], 0, img_h)) radius = int(Misc.map(radius, 0, size[0], 0, img_w)) |
(3) Auto shooting
① If a ball is detected, the program will initialize sub-steps and step sizes, and set the timer start flag. If the ball is not in the center of the frame, the robot’s orientation will be adjusted based on the ball’s position, and the corresponding turning action will be executed until the ball is in the center of the frame.
215 216 217 218 219 220 221 222 223 224 225 226 227 | if CenterX >= 0: # 如果检测到了球(if a ball is detected) step_ = 1 d_x, d_y = 20, 20 start_count= True # 开始计时标志置为True,在后面找不到球的情况下使用(set the flag for starting the timer to True, for use when the ball is not found later on) if step == 1: # 球不在画面中心,则根据方向让机器人转向一步,直到满足条件进入步骤2(if the ball is not in the center of the frame, instruct the robot to turn one step in the appropriate direction until the condition is met to enter step 2) if x_dis - servo_data['servo2'] > 150: AGC.runActionGroup('turn_left_small_step') elif x_dis - servo_data['servo2'] < -150: AGC.runActionGroup('turn_right_small_step') else: step = 2 |
② If the vertical servo position equals the set position, adjust the robot’s movement based on the current horizontal servo position. If the horizontal servo position is 400 units to the left or right of the set position, execute the corresponding turning action. If the ball is above the center of the frame, move forward one step. If the ball is below the center of the frame, move forward. If the ball is below the center of the frame and the horizontal servo position differs from the set position by no more than 200 units, move forward quickly; otherwise, execute the third step action.
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | elif step == 2: # 当控制头部垂直运动的舵机位置等于设定的位置(when the position of the servo controlling vertical movement of the head equals the set position) if y_dis == servo_data['servo1']: # 根据当前水平舵机位置调整机器人运动(adjust the robot's movement based on the current position of the horizontal servo) if x_dis == servo_data['servo2'] - 400: AGC.runActionGroup('turn_right',2) elif x_dis == servo_data['servo2'] + 400: AGC.runActionGroup('turn_left',2) elif 350 < CenterY <= 380: # ball_center_y值越大,与球的距离越近(the larger the value of ball_center_y, the closer the distance to the ball) AGC.runActionGroup('go_forward_one_step') last_status = 'go' # 记录上一步的状态是往前走(record that the previous step was moving forward) step = 1 elif 120 < CenterY <= 350: AGC.runActionGroup('go_forward') last_status = 'go' step = 1 elif 0 <= CenterY <= 120 and abs(x_dis - servo_data['servo2']) <= 200: AGC.runActionGroup('go_forward_fast') last_status = 'go' |
③ In step three, if the vertical servo position equals the set position, adjust the robot’s position based on the horizontal position of the ball in the frame. If the horizontal position of the ball deviates from the center of the frame by less than or equal to 40 units, move left. If the horizontal position of the ball is to the left of the center of the frame and the deviation is greater than 40 units, move quickly to the left. If the horizontal position of the ball is to the right of the center of the frame and the deviation is greater than 40 units, move quickly to the right; otherwise, execute the fourth step action.
If the vertical servo position is not equal to the set position, adjust based on the difference between the horizontal servo position and the set position: If the difference is between 270 and 480, move quickly to the left. If the difference is less than 170, move left. If the difference is between -480 and -270, move quickly to the right; otherwise, execute the fourth step action.
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 | elif step == 3: if y_dis == servo_data['servo1']: # 根据球在画面的x坐标左右平移调整位置(adjust the position based on the horizontal movement of the ball in the frame) if abs(CenterX - CENTER_X) <= 40: AGC.runActionGroup('left_move') elif 0 < CenterX < CENTER_X - 50 - 40: AGC.runActionGroup('left_move_fast') time.sleep(0.2) elif CENTER_X + 50 + 40 < CenterX: AGC.runActionGroup('right_move_fast') time.sleep(0.2) else: step = 4 else: if 270 <= x_dis - servo_data['servo2'] < 480: AGC.runActionGroup('left_move_fast') time.sleep(0.2) elif abs(x_dis - servo_data['servo2']) < 170: AGC.runActionGroup('left_move') elif -480 < x_dis - servo_data['servo2'] <= -270: AGC.runActionGroup('right_move_fast') time.sleep(0.2) else: step = 4 |
④ In step four, if the vertical servo position equals the set position, execute the following operations: If the vertical position of the ball is between 380 and 440, move forward one small step. If the vertical position of the ball is between 0
and 380, move forward; otherwise, based on the horizontal position of the ball, determine which foot to use for the shooting action. If the horizontal position of the ball is to the left of the center of the frame, use the left foot for a quick shot;
otherwise, use the right foot for a quick shot and reset the main step to 1. If the vertical servo position is not equal to the set position, reset the main step to 1.
285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | elif step == 4: if y_dis == servo_data['servo1']: # 小步伐靠近到合适的距离(take small steps to approach the appropriate distance) if 380 < CenterY <= 440: AGC.runActionGroup('go_forward_one_step') last_status = 'go' elif 0 <= CenterY <= 380: AGC.runActionGroup('go_forward') last_status = 'go' else: # 根据最后球的x坐标,采用离得近的脚去踢球(use the closest foot to kick the ball based on the final x coordinates of the ball) if CenterX < CENTER_X: AGC.runActionGroup('left_shot_fast') else: AGC.runActionGroup('right_shot_fast') step = 1 else: step = 1 |
⑤ If the ball is not detected, check if the robot’s previous state was “moving forward” . If it was, then quickly step back one step. If the timer has already started, reset the timer flag to False and record the current time as the start time for the timer. Otherwise, if the time since the last start of timing exceeds 0.5 seconds, perform the following operations based on the sub-step:
If the sub-step is 5, move the horizontal servo position. If the deviation between the horizontal servo position and the set position is less than or equal to the absolute value of the horizontal step size, perform the action to turn right, and reset the sub-step to 1.
If the sub-step is 1 or 3, move the horizontal servo position. If the horizontal servo position exceeds the set position plus 400, reset the sub-step to 2, and invert the horizontal step size. If the horizontal servo position is less than the set position minus 400, reset the sub-step to 4, and invert the horizontal step size.
If the sub-step is 2 or 4, move the vertical servo position. If the vertical servo position exceeds 1200, reset the sub-step to 3, and invert the vertical step size. If the vertical servo position is less than the set position, reset the sub-step to 5, and invert the vertical step size. Finally, set the servo pulse width to the vertical servo position and horizontal servo position, then sleep for 0.02 seconds.
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 | elif CenterX == -1: # 如果没检测到球(if no ball is detected) # 如果机器人上次状态为"前进",快速后退一步(if the robot's previous state was "forward," quickly take one step backward) if last_status == 'go': last_status = '' AGC.runActionGroup('back_fast', with_stand=True) elif start_count: # 开始计时的标志变量为True(set the flag variable for starting the timer to True) start_count= False t1 = time.time() # 记录当前的时间,开始计时(record the current time and start the timer) else: if time.time() - t1 > 0.5: if step_ == 5: x_dis += d_x if abs(x_dis - servo_data['servo2']) <= abs(d_x): AGC.runActionGroup('turn_right') step_ = 1 if step_ == 1 or step_ == 3: x_dis += d_x if x_dis > servo_data['servo2'] + 400: if step_ == 1: step_ = 2 d_x = -d_x elif x_dis < servo_data['servo2'] - 400: if step_ == 3: step_ = 4 d_x = -d_x elif step_ == 2 or step_ == 4: y_dis += d_y if y_dis > 1200: if step_ == 2: step_ = 3 d_y = -d_y elif y_dis < servo_data['servo1']: if step_ == 4: step_ = 5 d_y = -d_y ctl.set_pwm_servo_pulse(1, y_dis, 20) ctl.set_pwm_servo_pulse(2, x_dis, 20) time.sleep(0.02) |
5.5.5 Function Extension
Modify Default Recognition Color
Red, green and blue are the built-in colors in the auto shooting program and red is the default color. In the following steps, we’re going to modify the recognized color as green.
(1) Enter command to the directory where the game program is located.
cd TonyPi/Functions
(2) Enter command to go into the game program through vim editor.
vim KickBall.py
(3) Locate code “ball_color = (‘red’)”.
Note
After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.
(4) Press “i” to enter the editing mode, then modify red in ball_color = (‘red’) to green. If you want to recognize blue, please revise to “blue” .
(5) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor. Input English at first, then input wq.
:wq
Add Recognized Color
In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take orange as example:
(1) Open VNC, input command to open Lab color setting document.
vim TonyPi/lab_config.yaml
It is recommended to use screenshot to record the initial value.
(2) Click the debugging tool icon in the system desktop. Choose “Execute” in the pop-up window.
(3) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.
(4) Drag the corresponding sliders of L, A, and B until the color area to be recognized in the left screen becomes white and other areas become black.
For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the orange part of the left screen becomes white and other colors become black, and then click “ Save” button to keep the modified data.
(5) After the modification is completed, check whether the modified data was successfully written in. Enter the command again vim TonyPi/lab_config.yaml to check the color setting parameters.
vim TonyPi/lab_config.yaml
For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.
(6) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter “:wq” to save it and exit.
(7) The default recognized color can be set as red according to the 5.5.5 Function Extension -> Modify Default Recognition Color in this text.
(8) Start the game again and put the orange ball in front of the camera. After recognition, the robot will adjust its position to close the ball and kick it forward.
5.6 Line Follow
Lay the red tape and then place the robot on the line. TonyPi will move along the red track.
5.6.1 Program Description
Line tracking is common in robot competitions which is implemented by two-channel or four-channel line-tracking sensors.However, TonyPi only need the vision module to recognize the line color, process by image algorithms, to realize the line follow.
First, program TonyPi to recognize colors with Lab color space. You can go to “OpenCV Basic Course” for detailed learning of Lab color space.
Second, identify the object color in the circle using color threshold value, then apply a mask to that part of the image. Masking is the process of using selected images, graphics, or objects to globally or locally obscure parts of the processed image.
After processing the corrosion and inflation of the object image, the largest object contour is circled.
Corrosion: By iterating through each pixel of the image, check its overlap with the surrounding structural element. If all the overlapping pixel values are 1, then keep the original pixel value unchanged; otherwise, set it to 0. Mainly used to eliminate unimportant edge information in the image, reducing the area of the image.
Inflation: Similar to the inverse process of erosion. This process involves convolving the image with a structural element, calculate the maximum pixel value within the covered area, and assign this maximum value to the pixel specified by the reference point. The inflation expands the highlighted areas in an image gradually, typically used to fill holes or gaps in the image.
Thirdly, after recognition, process the servo part with x and y coordinates of the center point of the image as the set values. Input the current acquired x and y coordinates to update the pid.
Fourthly, calculate according to the feedback of the line position in the image, and program the robot to follow the line to achieve the function of intelligent line tracking.
5.6.2 Start and Close the Game
Note
The input of commands must strictly distinguish between uppercase and lowercase letters, as well as spaces. Additionally, you can use the “Tab” key on the keyboard to auto-complete keywords.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Double-click “Terminator” icon
in the Raspberry Pi desktop and open command line.
(3) Input and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(4) Input command, then press Enter to start the game.
python3 VisualPatrol.py
(5) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.6.3 Program Outcome
Note
The program defaults to recognizing the color red. To switch to blue or green, refer to “5.6.5 Function Extension -> Modify Default Tracking Color”.
Lay the red tape and then place the robot on the line. TonyPi will move along the red track.
5.6.4 Program Analysis
The source code of this program is locate in /home/pi/TonyPi/Functions/VisualPatrol.py
Import Parameter Module
| Import module | function |
|---|---|
| import sys | The Python "sys" module has been imported for accessing system-related functions and variables. |
| import os | The Python "os" module has been imported, providing functions and methods for interacting with the operating system. |
| import cv2 | The OpenCV library has been imported for image processing and computer vision-related functionalities. |
| import time | The Python "time" module has been imported for time-related functionalities, such as delay operations. |
| import math | The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants. |
| import threading | Provides an environment for running multiple threads concurrently. |
| import np | The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations. |
| import sensor.camera as camera | Import camera library |
| from common import misc | The "Misc" module has been imported for handling recognized rectangular data. |
| import common.ros_robot_controller_sdk as rrc | The robot's low-level control library has been imported for controlling servos, motors, RGB lights, and other hardware. |
| import common.yaml_handle | Contains functionalities or tools related to processing YAML format files. |
| from common.controller import Controller | Import action group execution library |
Function Logic
Capture image information through the camera, then process the image, specifically by performing binarization. At the same time, to reduce interference and make the image smoother, perform erosion and dilation operations on the image.
Next, obtain the maximum area contour of the target and the minimum enclosing rectangle, then sum the centers of the three rectangles. Based on the final calculated center point position, call the action group to line follow.
Program Logic and Related Code Analysis
(1) Initialization
① Import function library
In this initialization step, the first task is to import the required libraries for subsequent program calls. For details on the imports, refer to “5.6.4 Program Analysis -> Import parameter module”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #!/usr/bin/python3 # coding=utf8 import sys import os import cv2 import time import math import threading import numpy as np import hiwonder.Camera as Camera import hiwonder.Misc as Misc import hiwonder.ros_robot_controller_sdk as rrc from hiwonder.Controller import Controller import hiwonder.ActionGroupControl as AGC import hiwonder.yaml_handle as yaml_handle |
② Set initial state
Set initial state, including the initial position of servo, roi area, etc.
56 57 58 59 | # 初始化机器人舵机初始位置 def initMove(): ctl.set_pwm_servo_pulse(1, servo_data['servo1'], 500) ctl.set_pwm_servo_pulse(2, servo_data['servo2'], 500) |
137 138 139 140 141 142 143 144 145 146 147 148 149 | roi = [ # [ROI, weight] (240, 280, 0, 640, 0.1), (340, 380, 0, 640, 0.3), (440, 480, 0, 640, 0.6) ] roi_h1 = roi[0][0] roi_h2 = roi[1][0] - roi[0][0] roi_h3 = roi[2][0] - roi[1][0] roi_h_list = [roi_h1, roi_h2, roi_h3] size = (640, 480) |
(2) Image processing
① Image pre-processing
Resizing and Gaussian blur processing of the image.
161 162 | frame_resize = cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) frame_gb = cv2.GaussianBlur(frame_resize, (3, 3), 3) |
cv2.resize(img_copy, size, interpolation=cv2.INTER_NEAREST) is an operation to resize the image.
The first parameter img_copy is the image to be resized.
The second parameter size is the target size.
The third parameter interpolation is the interpolation method, which is used to determine the pixel interpolation algorithm used for resizing.
cv2.GaussianBlur(frame_resize, (3, 3), 3) applies Gaussian blur to the image.
The first parameter frame_resize is the image to be blurred.
The second parameter (3, 3) is the size of the Gaussian kernel, indicating that the width and height of the kernel are both 3.
The third parameter 3 is the standard deviation of the Gaussian kernel, used to control the degree of blur.
② Set roi area
From frame_gb, crop out the corresponding ROI regions based on each element in the roi list and the height values corresponding to roi_h_list. Save these regions in the blobs variable.
169 170 171 172 173 | #将图像分割成上中下三个部分,这样处理速度会更快,更精确(segment the image into three parts: upper, middle, and lower. This will improve processing speed and accuracy) for r in roi: roi_h = roi_h_list[n] n += 1 blobs = frame_gb[r[0]:r[1], r[2]:r[3]] |
③ Color space conversion
Convert the BGR image to LAB image.
174 | frame_lab = cv2.cvtColor(blobs, cv2.COLOR_BGR2LAB) # 将图像转换到LAB空间(convert the image to LAB space) |
④ Binarization processing
Use inRange() function in cv2 library to process binarization.
177 178 179 180 181 182 183 184 185 | if i in __target_color: detect_color = i frame_mask = cv2.inRange(frame_lab, (lab_data[i]['min'][0], lab_data[i]['min'][1], lab_data[i]['min'][2]), (lab_data[i]['max'][0], lab_data[i]['max'][1], lab_data[i]['max'][2])) #对原图像和掩模进行位运算(operate bitwise operation to original image and mask) |
The first parameter frame_lab is inputting image.
The second parameter lab_data[i]['min'][0] is the lower limit of the threshold.
The third parameter lab_data[i]['max'][0] is the upper limit of the threshold.
⑤ Corrosion and inflation
(operate bitwise operation to original image and mask)
186 187 188 189 | eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #腐蚀(corrosion) dilated = cv2.dilate(eroded, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) #膨胀(dilation) dilated[:, 0:160] = 0 dilated[:, 480:640] = 0 |
eroded = cv2.erode(frame_mask, cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))) is the operation to perform corrosion on the binary image.
The first parameter frame_mask is the binary image on which morphological operations are to be performed.
The second parameter cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3)) is the structuring element for the corrosion operation. A rectangular structuring element of size (3, 3) is used here.
The dilation function follows the same principle.
dilated[:, 0:160] = 0 set all pixel values in the first 160 columns on the left side of the image (from column 0 to column 159) to 0, i.e., turn them black, to remove the unnecessary parts of the image for recognition.
dilated[:, 480:640] = 0 set all pixel values in the right side from column 480 to column 639 to 0, i.e., turn them black, to remove the unnecessary parts of the image for recognition.
⑥ Get the contour with the largest area
After completing the above image processing, it is necessary to obtain the contours of the recognized targets. This involves using the findContours() function from the cv2 library.
190 | cnts = cv2.findContours(dilated , cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_TC89_L1)[-2]#找出所有轮廓(find out all contours) |
Take code contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[-2] as example:
The first parameter dilated is inputting image.
The second parameter cv2.RETR_EXTERNAL is the contour retrieval mode.
The third parameter cv2.CHAIN_APPROX_NONE)[-2] is the contour approximation method.
Find the contour with the largest area in the obtained contour. In order to avoid interference, you need to set a minimum value. The target contour is considered valid only if its area is greater than this value.
{lineno-start=}
areaMaxContour, area_max = area_maxContour(contours) # 找出最大轮廓
if areaMaxContour is not None:
if area_max > max_area: #找最大面积
max_area = area_max
color_area_max = i
areaMaxContour_max = areaMaxContour
⑦ Get the center position coordinates of the line
Use the misc function to map the x and y coordinates of the object center, as well as the radius, from the original size range to the range of the new image size (”img_w” and “img_h”). Then, use the cv2.circle function to draw a circle around the color block.
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | if cnt_large is not None:#如果轮廓不为空 rect = cv2.minAreaRect(cnt_large)#最小外接矩形 box = np.int0(cv2.boxPoints(rect))#最小外接矩形的四个顶点 for i in range(4): box[i, 1] = box[i, 1] + (n - 1)*roi_h + roi[0][0] box[i, 1] = int(Misc.map(box[i, 1], 0, size[1], 0, img_h)) for i in range(4): box[i, 0] = int(Misc.map(box[i, 0], 0, size[0], 0, img_w)) cv2.drawContours(img, [box], -1, (0,0,255,255), 2)#画出四个点组成的矩形 #获取矩形的对角点 pt1_x, pt1_y = box[0, 0], box[0, 1] pt3_x, pt3_y = box[2, 0], box[2, 1] center_x, center_y = (pt1_x + pt3_x) / 2, (pt1_y + pt3_y) / 2#中心点 cv2.circle(img, (int(center_x), int(center_y)), 5, (0,0,255), -1)#画出中心点 center_.append([center_x, center_y]) #按权重不同对上中下三个中心点进行求和 centroid_x_sum += center_x * r[4] weight_sum += r[4] if weight_sum != 0: #求最终得到的中心点 cv2.circle(img, (line_center_x, int(center_y)), 10, (0,255,255), -1)#画出中心点 line_center_x = int(centroid_x_sum / weight_sum) else: line_center_x = -1 |
(3) Intelligent line follow
Based on the calculated difference between the X-coordinate of the line center point and the X-coordinate of the screen center, call different action groups to follow the line.
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | def move(): global line_center_x while True: if __isRunning: if line_center_x != -1: if abs(line_center_x - img_centerx) <= 50: AGC.runActionGroup('go_forward') elif line_center_x - img_centerx > 50: AGC.runActionGroup('turn_right_small_step') elif line_center_x - img_centerx < -50: AGC.runActionGroup('turn_left_small_step') else: time.sleep(0.01) else: time.sleep(0.01) |
If the differential is less than or equal to ±50: Call the go_forward action group.
If the differential is greater than 50: Call the turn_right_small_step to perform turning right action group.
If the differentials is greater than -50: Call the turn_left_small_step to perform turning left action group.
5.6.5 Function Extension
Modify Default Tracking Color
Black, red and white are the built-in colors in the line follow program and black is the default color. In the following steps, we’re going to modify the tracking color as red.
(1) Enter command to the directory where the game program is located.
cd TonyPi/Functions
(2) Enter command to go into the game program through vi editor.
vim VisualPatrol.py
(3) Locate code __target_color = ('black').
Note
After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.
(4) Press “i” to enter the editing mode, then modify black in “_target_color = (‘black’)” to red. If you want to recognize white, please revise to “white”.
(5) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.
:wq
Add Recognized Color
In addition to the built-in recognized colors, you can set other recognized colors in the programming. Take blue as example:
(1) Open VNC, input command to open Lab color setting document.
vim TonyPi/lab_config.yaml
It is recommended to use screenshot to record the initial value.
(2) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.
(3) Click “Connect” button in the lower left hand. When the interface display the camera returned image, the connection is successful. Select “red” in the right box first.
(4) For example, if you want to recognize orange, you can put the orange ball in the camera’s field of view. Adjust the corresponding sliders of L, A, and B until the blue part of the left screen becomes white and other colors become black, and then click “Save” button to keep the modified data.
For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.
(5) After the modification is completed, check whether the modified data was successfully written in. Enter the command again vim TonyPi/lab_config.yaml to check the color setting parameters.
vim TonyPi/lab_config.yaml
For the game’s performance, it’s recommended to use the LAB_Tool tool to modify the value back to the initial value after the modification is completed.
(6) Check the data in red frame. If the edited value was written in the program, press “Esc” and enter :wq to save it and exit.
:wq
(7) The default tracking color can be set as black according to the “5.6.5 Function Extension -> Modify Default Tracking Color” in this text.
(8) Starting the game again, TonyPi will track along the blue line. If you want to add other colors as tracking color, please operate as the above steps.
5.7 Tag Detection
5.7.1 Program Description
When the robot detects a tag, the buzzer emits a sound, and the feedback image is returned.
AprilTag, a visual fiducial marker, is similar to a QR code or barcode. It can be used to quickly detect markers and calculate relative positions, meeting real-time requirements. It is widely used in various applications such as augmented reality (AR), robotics, and camera calibration. Currently, AprilTags can be printed using a standard printer, and their detection programs can calculate precise 3D position, orientation, and ID relative to the camera.
In this lesson, we will combine OpenCV with AprilTag to complete a small project for detecting AprilTag markers. When the camera detects the tag, the robot’s onboard buzzer will sound as a prompt, and the feedback image will be displayed.
5.7.2 Start and Close the Game
Note
The input of commands must strictly distinguish between uppercase and lowercase letters, as well as spaces.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.
(2) Click the icon
in the top left corner of the system desktop or press the shortcut “Ctrl+Alt+T” to open the LX terminal.
(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:
cd TonyPi/Functions/
(4) Enter the command and press Enter to start the program:
python3 Tag_Detect.py
(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.
5.7.3 Program Outcome
Note
For optimal tag detection, place the tag against a solid-colored or white background. Dark backgrounds (e.g., black) may interfere with tag recognition.
Once the game is activated, position the included AprilTag tag in front of the camera. When the robot detects the tag, the buzzer will sound as a prompt. The feedback image will display the captured tag, outline it, and show the tag’s tag_id and tag_family information.
5.7.4 Program Analysis
The source code for this program is located at : /home/pi/TonyPi/Functions/Tag_Detect.py
(1) Image Acquisition and Processing
The first step is image processing, which involves working with digital image data. We begin by importing the necessary packages.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #!/usr/bin/python3 # coding=utf8 import sys import cv2 import math import time import numpy as np import hiwonder.Camera as Camera import hiwonder.ros_robot_controller_sdk as rrc import hiwonder.yaml_handle as yaml_handle import hiwonder.apriltag as apriltag # 检测apriltag |
Next, we initialize and start the camera to acquire the image, then proceed to copy, remap, and display the image.
88 89 90 91 92 93 94 95 96 | ret, img = my_camera.read() if img is not None: frame = img.copy() frame = cv2.remap(frame, mapx, mapy, cv2.INTER_LINEAR) # 畸变矫正(distortion correction) Frame = run(frame) cv2.imshow('Frame', Frame) key = cv2.waitKey(1) if key == 27: break |
Afterward, we need to convert the image from RGB format to grayscale. The corresponding code is as follows:
20 | gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) |
(2) Tag Detection
Once the image has been processed, we need to detect the tag. This is done by using the tag library to detect the tag in the acquired image. The code implementation is as follows:
21 | detections = detector.detect(gray, return_image=False) |
After detection, the program will obtain the four corner points of the tag.
25 | corners = np.int0(detection.corners) # 获取四个角点 |
Next, we need to draw the contours of the tag. In OpenCV, we use the cv2.drawContours function to accomplish this. The program code is as follows:
26 | cv2.drawContours(img, [np.array(corners, int)], -1, (0, 255, 255), 2) |
This function takes five parameters, each with the following meanings:
① img: The image to be processed.
② [np.array(corners, np.int)]: The contour points.
③ -1: The contour index. -1 indicates that all contours should be drawn.
④ (0, 255, 255): The color of the contour.
⑤ 2: The thickness of the contour line.
(3) Retrieving Tag Information
The program uses the AprilTag library to perform encoding and decoding to retrieve the tag’s information. Depending on the encoding method, different inner point coordinates are generated.
Once the quadrilateral is identified, the grid coordinates are clarified. To verify the reliability of the encoding, the tag must be matched against a known encoding library.
27 28 29 30 31 32 33 34 | tag_family = str(detection.tag_family, encoding='utf-8') # 获取tag_family(get tag_family) tag_id = int(detection.tag_id) # 获取tag_id(get tag_id) object_center_x, object_center_y = int(detection.center[0]), int(detection.center[1]) # 中心点(center point) object_angle = int(math.degrees(math.atan2(corners[0][1] - corners[1][1], corners[0][0] - corners[1][0]))) # 计算旋转角(calculate rotation angle) return tag_family, tag_id |
5.8 Tag Recognition
5.8.1 Program Description
The robot executes corresponding action groups by recognizing different ID tags.
AprilTag, a visual positioning marker, can quickly detect the marker and calculate the position. It’s mainly applied to AR, robot and camera calibration, etc.
The following is the overall process:
First, detect AprilTag through positioning, image segmentation, and contour search. Then the quadrilateral detection is performed after the contour is positioned. Connect the four corner points with a straight line to form a closed loop.
Encoding and decoding the detected tags. Finally, add the corresponding execution action according to the decoding tags with different IDs.
5.8.2 Start and Close the Game
Note
Pay attention to the text format in the input of instructions.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop. Double-click “Terminator” icon
in the Raspberry Pi desktop and open command line.
(2) Input command and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(3) Input command, then press Enter to start the game.
python3 ApriltagDetrect.py
(4) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.8.3 Project Outcome
Note
Please run this game on a solid color or a white background. Dark background such as black will affect the tag recognition performance.
After starting the tag recognition, place the tag cards in front of the camera to recognize in turns. TonyPi will execute the corresponding actions when the tad is recognized.
| Tag ID | Action |
|---|---|
| 1 | Bowing |
| 2 | Mark time |
| 3 | Dancing |
5.8.4 Program Analysis
The source code of this program is locate in: /home/pi/TonyPi/Functions/ApriltagDetect.py.
Import Parameter Module
| Import module | function |
|---|---|
| import sys | The Python "sys" module has been imported for accessing system-related functions and variables. |
| import os | The Python "os" module has been imported, providing functions and methods for interacting with the operating system. |
| import cv2 | The OpenCV library has been imported for image processing and computer vision-related functionalities |
| import time | The Python "time" module has been imported for time-related functionalities, such as delay operations. |
| import math | The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants. |
| import threading | Provides an environment for running multiple threads concurrently. |
| import np | The NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations. |
| import common.apriltag as apriltag | Import apriltag library |
| import sensor.camera as camera | Import camera library |
| from common import misc | The "Misc" module has been imported for handling recognized rectangular data. |
| import common.ros_robot_controller_sdk as rrc | The robot's underlying control library has been imported for controlling servos, motors, RGB lights, and other hardware. |
| import common.yaml_handle | Contains functionalities or tools related to processing YAML format files. |
| from common.controller import Controller | Import action group execution library |
Function Logic
Capture image information through the camera, then process the image, specifically by performing color space conversion. That is facilitate for people to perform tag detection.
Next, use apriltag library to perform tag detection, get tag ID and call action group to perform feedback.
Program Logic and Related Code Analysis
(1) Import function library
In this initialization step, the first task is to import the required libraries for subsequent program calls. For details on the imports, refer to “5.8.4 Program Analysis -> Import parameter module”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #!/usr/bin/python3 # coding=utf8 import sys import cv2 import math import time import threading import numpy as np import hiwonder.Camera as Camera import hiwonder.Misc as Misc import hiwonder.ros_robot_controller_sdk as rrc from hiwonder.Controller import Controller import hiwonder.ActionGroupControl as AGC import hiwonder.yaml_handle as yaml_handle import hiwonder.apriltag as apriltag |
(2) Set initial state
Set initial state, including the initial position of servo and tag ID.
38 39 40 41 | # 初始化机器人舵机初始位置(initialize the initialization position of robot) def initMove(): ctl.set_pwm_servo_pulse(1, 1500, 500) ctl.set_pwm_servo_pulse(2, 1500, 500) |
Image processing
(1) Create AprilTag detector
Detect visual markers using the default marker patterns provided by the AprilTag library. You can use it to detect AprilTag markers in an image and obtain information about these markers, such as their position coordinates and IDs.
119 120 | # 检测apriltag(detect apriltag) detector = apriltag.Detector(searchpath=apriltag._get_demo_searchpath()) |
(2) color space conversion
Convert the BGR image to GRAY image.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(3) Detect tag
Use the created detector object (i.e., AprilTag detector) to detect AprilTag markers in the grayscale image “gray”.
123 | detections = detector.detect(gray, return_image=False) |
(4) Get tag information
Retrieve the tag ID corner information, use the cv2.drawContours function to draw the tag on the image, and obtain the tag ID and tag class.
125 126 127 128 129 130 131 | if len(detections) != 0: for detection in detections: corners = np.int0(detection.corners) # 获取四个角点(get four corners) cv2.drawContours(img, [np.array(corners, int)], -1, (0, 255, 255), 2) tag_family = str(detection.tag_family, encoding='utf-8') # 获取tag_family(get tag_family) tag_id = int(detection.tag_id) # 获取tag_id(get tag_id) |
(5) Print tag information
Use the cv2.putText function to print the detected ID information.
153 154 155 156 157 158 159 160 | if tag_id is not None: cv2.putText(img, "tag_id: " + str(tag_id), (10, img.shape[0] - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 255, 255], 2) cv2.putText(img, "tag_family: " + tag_family, (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 255, 255], 2) else: cv2.putText(img, "tag_id: None", (10, img.shape[0] - 30), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 255, 255], 2) cv2.putText(img, "tag_family: None", (10, img.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.65, [0, 255, 255], 2) return img |
Tag recognition
According to the detected tag ID, use the agc.run_action_group function to invoke the corresponding action group file and control the robot’s movement.
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | while True: if debug: return if __isRunning: if tag_id is not None: action_finish = False time.sleep(0.5) if tag_id == 1:#标签ID为1时(when the tag ID is 1) AGC.runActionGroup('bow')#鞠躬(bow) tag_id = None time.sleep(1) action_finish = True elif tag_id == 2: AGC.runActionGroup('stepping')#原地踏步(march in place) tag_id = None time.sleep(1) action_finish = True elif tag_id == 3: AGC.runActionGroup('twist')#扭腰(twist waist) tag_id = None time.sleep(1) action_finish = True else: action_finish = True time.sleep(0.01) else: time.sleep(0.01) else: time.sleep(0.01) |
5.8.5 Function Extension
Modify the Action Corresponding to the Tag
Program default setting is that TonyPi will bow when the tag ID i is detected. We can revise the feedback action to wave hand for example.
(1) Enter command to the directory where the game program is located.
cd TonyPi/Functions/
(2) Enter command to go into the game program through vi editor.
vim ApriltagDetect.py
(3) Find code AGC.runActionGroup(bow).
Note
After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.
(4) Based on the description of the TonyPi Action Group List Instruction located in the path /home/pi/TonyPi/ActionGroups, it is known that bow corresponds to bowing.
(5) Press “i” to enter the editing mode, then modify the (‘bow’) in AGC.runActionGroup(‘bow’) to AGC.runActionGroup(‘wave’)
(6) Press “Esc” to enter last line command mode. Input “:wq” to save the file and exit the editor.
:wq
Modify or Add the Tag Recognition
The tag data is located in the ApirlTag Tag Collection folder under the directory of this section. (The directory needs to be unzipped first)
① You don’t need to download materials online, please go to the directory of this section to find ApirlTag Tag Collection for the provided tags. (200 tags in total)
② There is no absolute size requirement for the tag size if you want to print your tags. It is not recommended to be too large or too small for the performance of recognition. (The tag will be circled when recognized.)
③The background next to the tag will be better to keep in white. The dark background may affect the recognition.
In the following sample, we will add the ID4 as new tag. When the tag is recognized, TonyPi will run the “Cheering” action group.
(1) Take the reference of “5.8.5 Function Extension -> Modify the Action Corresponding to the Tag”, enter the catalog and open the program file.
(2) Next, you need to copy the code inside the “elif” branch. Here, we can copy the code shown in the image. Move the mouse cursor to the corresponding “elif” line, then type “5yy” on the keyboard (to copy 5 lines). You will see a prompt “5 lines yanked” at the bottom, indicating successful copying.
(3) Then paste these 5 lines of code, and move the mouse cursor to the position shown in the figure below:
(4) Enter “p” on the keyboard to paste the previously copied 5 lines of code below:
(5) Modify the copy code. Enter “i” to the editing mode and revise “tag_id” to “4”, and the action in the “AGC.runActionGroup” to “chest”.
(6) Modify the copy code. Enter “i” to the editing mode and revise “tag_id” to “4”, and the action in the “AGC.runActionGroup” to “chest”.
(7) Take the ID4 tag in folder “ApirlTag Tag Collection” and print it directly.
(8) Check the project outcome according to the commands in previous learning.
5.9 Face Detect
5.9.1 Brief Description of the Activity
When no face is detected, the robotic arm rotates left and right to scan the area. Once a face is detected, the claw moves up and down as a greeting.
Face recognition is one of the most widely used applications in artificial intelligence, particularly in image recognition. Among these applications, face recognition is the most popular, often used in scenarios like smart locks and facial unlocking on mobile phones.
In this activity, we first train the face recognition model. The system then detects faces by scaling the image. After detection, the coordinates of the recognized face are converted back to the original scale, and the largest face is identified. The recognized face is then outlined with a frame.
Next, the pan-tilt servos are set to rotate left and right to locate the face. Finally, the robot executes the feedback action based on the recognition results.
5.9.2 Start and Close the Game
Note
The input of commands must strictly distinguish between uppercase and lowercase letters.
(1) Power on the device and, following the instructions in “Remote Desktop Installation and Connection\3.1 Remote Desktop Installation and Connection”, use the VNC remote connection tool to connect.
(2) Click the icon
in the top left corner of the system desktop or press the shortcut “Ctrl+Alt+T” to open the LX terminal.
(3) In the terminal, enter the command to navigate to the directory where the program is located, then press Enter:
cd TonyPi/Functions/
(4) Enter the command and press Enter to start the program:
python3 Face_Detect.py
(5) To close the program, simply press “Ctrl+C” in the LX terminal. If it does not close, press it multiple times.
5.9.3 Program Outcome
Note
For optimal performance, please avoid using this activity under strong lighting conditions, such as direct sunlight or close proximity to incandescent lights, as intense light can affect face recognition accuracy. It is recommended to conduct this activity indoors, with the face positioned within a range of 50 cm to 1 meter from the camera.
Once the activity begins, the camera’s pan-tilt will rotate left and right. If no face is detected, the robotic arm will scan by rotating left and right. Upon detecting a face, the claw will move up and down to greet the user.
5.9.4 Program Analysis
The source code of the program is saved in:/home/pi/TonyPi/Functions/Face_Detect.py
Importing Parameter Modules
| Module Import | Purpose |
|---|---|
| import sys | Imports the Python sys module, which provides access to system-specific parameters and functions. |
| import cv2 | Imports the OpenCV library, which is used for image processing and computer vision tasks. |
| import time | Imports the Python time module, which provides functions for handling time-related tasks, such as delays. |
| import HiwonderSDK.Misc as Misc | Imports the Misc module from the Hiwonder SDK for handling recognized rectangular data. |
| import threading | Provides support for running tasks in multiple threads concurrently |
| import yaml_handle | Contains functions or tools for handling YAML format files |
| from ArmIK.Transform import * | Imports functions for robotic arm posture transformations |
| from ArmIK.ArmMoveIK import * | Provides functions for inverse kinematics solving and control for robotic arm movement |
| import HiwonderSDK.Board as Board | Imports the Board module from the Hiwonder SDK, which is used to control sensors and execute related actions |
Function Logic
The camera captures image data, which is then processed by converting the image into a different color space to facilitate face detection.
The Mediapipe face detection model is used to identify faces in the image. Once detected, the system triggers the appropriate action group to provide feedback based on the detected faces.
This flow ensures that the system accurately detects and responds to faces.
Program Logic and Code Analysis
From the above flowchart, the main program logic focuses on image processing and face recognition. The following content will follow this program logic flow.
(1) Importing Libraries
1 2 3 4 5 6 7 8 9 10 11 12 | #!/usr/bin/python3 # coding=utf8 import sys import cv2 import math import time import threading import numpy as np import mediapipe as mp import hiwonder.ros_robot_controller_sdk as rrc import hiwonder.yaml_handle as yaml_handle import hiwonder.Camera as Camera |
(2) Setting Initial State
16 17 18 19 20 21 22 | # 初始化机器人底层驱动(initialize the underlying driver of robot) board = rrc.Board() # 导入人脸识别模块(import human face detection module) face = mp.solutions.face_detection # 自定义人脸识别方法,最小的人脸检测置信度0.5(custom human face recognition method, the minimum human face detection confidence is 0.5) face_detection = face.FaceDetection(min_detection_confidence=0.5) |
(3) Color Space Conversion
The BGR image is converted to an RGB image.
53 | image_rgb = cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB) # 将BGR图像转为RGB图像(convert the BGR image to RGB image) |
(4) Using Mediapipe Face Model for Recognition
The system performs face detection and draws a rectangle around the detected face. Then, the position of the face is compared to the center of the image. If the face is centered, start_greet is set to True to trigger the action group.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | results = face_detection.process(image_rgb) # 将每一帧图像传给人脸识别模块(pass each frame of the image to the face recognition module) if results.detections: # 如果检测不到人脸那就返回None(if no face is detected, return None) for index, detection in enumerate(results.detections): # 返回人脸索引index(第几张脸),和关键点的坐标信息(return the face index (which face it is) and the coordinates of the key points) bboxC = detection.location_data.relative_bounding_box # 设置一个边界框,接收所有的框的xywh及关键点信息(set up a bounding box to receive the xywh coordinates and key point information of all the boxes) # 将边界框的坐标点,宽,高从比例坐标转换成像素坐标(convert the coordinates, width, and height of the bounding box from relative coordinates to pixel coordinates) bbox = (int(bboxC.xmin * img_w), int(bboxC.ymin * img_h), int(bboxC.width * img_w), int(bboxC.height * img_h)) cv2.rectangle(img, bbox, (0,255,0), 2) # 在每一帧图像上绘制矩形框(draw a rectangle on each frame of the image) if di_once: board.set_buzzer(1900, 0.3, 0.7, 1) di_once = False else: di_once = True return img |
5.10 Facial Recognition
5.10.1 Program Description
In artificial intelligence, one of the most widespread applications is image recognition, with facial recognition being the hottest application in image recognition. It is commonly used in scenarios like door locks and phone facial unlocking.
In this section, the trained face model is first zoomed to detect the face. Then the recognized face coordinates are converted to the coordinates before scaling. Judge whether it is the largest face, and frame the recognized face.
Then set the servo to rotate left and right to obtain the face, and call the action group to let the robot perform the recognized feedback.
5.10.2 Start and Close the Game
Note
Pay attention to the text format in the input of instructions.
(1) Power on the robot and use VNC Viewer to connect to the remote desktop.Input and press Enter to locate to the directory where the program is stored.
cd TonyPi/Functions
(2) Input command, then press Enter to start the game.
python3 FaceDetect.py
(3) If you want to exit the game programming, press “Ctrl+C”. If the exit fails, please try it few more times.
5.10.3 Project Outcome
Note
Please do not try the Facial Recognition game under strong light, such as sunlight. Strong light will affect the recognition performance, so it is recommended to play this game indoors. It’s better to set the distance between face and camera with 50-100cm.
Start the facial recognition function, TonyPi will rotate its head to detect face. It will stop when the face is recognized, and run the greeting actions.
5.10.4 Program Analysis
The source code of this program is locate in: “/home/pi/TonyPi/Functions/FaceDetect.py”.
Import Parameter Module
| Import module | function |
|---|---|
| import sys | The Python "sys" module has been imported for accessing system-related functions and variables. |
| import os | The Python "os" module has been imported, providing functions and methods for interacting with the operating system. |
| import cv2 | The OpenCV library has been imported for image processing and computer vision-related functionalities |
| import time | The Python "time" module has been imported for time-related functionalities, such as delay operations. |
| import math | The "math" module provides low-level access to mathematical operations, including many commonly used mathematical functions and constants. |
| import threading | Provides an environment for running multiple threads concurrently. |
| import np | he NumPy library has been imported. It is an open-source numerical computing extension for Python, used for handling array and matrix operations. |
| import mediapipe as mp | Import mediapipe library, which is used to detect human face |
| import sensor.camera as camera | Import camera library |
| from common import misc | The "Misc" module has been imported for handling recognized rectangular data. |
| import common.ros_robot_controller_sdk as rrc | The robot's underlying control library has been imported for controlling servos, motors, RGB lights, and other hardware. |
| import common.yaml_handle | Contains functionalities or tools related to processing YAML format files. |
| from common.controller import Controller | Import action group execution library |
Function Logic
Capture image information through the camera, then process the image, specifically by
performing color space conversion. That is facilitate for people to perform face detection.
Next, use mediapipe human face model library to perform face detection, get face detection
result and call action group to perform feedback.
Program Logic and Related Code Analysis
(1) Initialization
① Import function library
In this initialization step, the first task is to import the required libraries for subsequent program calls. For details on the imports, refer to “4.10.4 Program Analysis -> Import parameter module”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!/usr/bin/python3 # coding=utf8 import sys import os import cv2 import math import time import threading import numpy as np import mediapipe as mp import hiwonder.Camera as Camera import hiwonder.Misc as Misc import hiwonder.ros_robot_controller_sdk as rrc from hiwonder.Controller import Controller import hiwonder.ActionGroupControl as AGC import hiwonder.yaml_handle as yaml_handle |
② Set initial state
Set initial state, including the initial position of servo, human face recognition module, Minimum Face Confidence, etc.
53 54 55 56 | # 初始化机器人舵机初始位置(initialize the servo initialization position of robot) def initMove(): ctl.set_pwm_servo_pulse(1, 1800, 500) ctl.set_pwm_servo_pulse(2, servo2_pulse, 500) |
34 35 36 37 | # 导入人脸识别模块(import human face recognition module) face = mp.solutions.face_detection # 自定义人脸识别方法,最小的人脸检测置信度0.5(custom human face recognition method, the minimum human face detection confidence is 0.5) face_detection = face.FaceDetection(min_detection_confidence=0.5) |
(2) Image processing
① Color space conversion
Convert the BGR image to LAB image.
139 | image_rgb = cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB) # 将BGR图像转为RGB图像(convert the BGR image to RGB image) |
② Use mediapipe human face model recognition
Perform face detection and draw rectangles around the detected faces. Then, based on whether the position of the face center is in the center of the frame, if so, set “start_greet” to True to execute the action group.
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | results = face_detection.process(image_rgb) # 将每一帧图像传给人脸识别模块(pass each frame of the image to the face recognition module) if results.detections: # 如果检测不到人脸那就返回None(if no face is detected, return None) for index, detection in enumerate(results.detections): # 返回人脸索引index(第几张脸),和关键点的坐标信息eturn the face index (which face it is) and the coordinates of the key points) bboxC = detection.location_data.relative_bounding_box # 设置一个边界框,接收所有的框的xywh及关键点信息(set up a bounding box to receive the xywh coordinates and key point information of all the boxes) # 将边界框的坐标点,宽,高从比例坐标转换成像素坐标(convert the coordinates, width, and height of the bounding box from relative coordinates to pixel coordinates) bbox = (int(bboxC.xmin * img_w), int(bboxC.ymin * img_h), int(bboxC.width * img_w), int(bboxC.height * img_h)) cv2.rectangle(img, bbox, (0,255,0), 2) # 在每一帧图像上绘制矩形框(draw a rectangle on each frame of the image) x, y, w, h = bbox # 获取识别框的信息,xy为左上角坐标点(information about the recognition box, where xy represents the coordinates of the top-left corner) center_x = int(x + (w/2)) if abs(center_x - img_w/2) < img_w/4: if action_finish: start_greet = True |
(3) Face detection
If a face is detected, use the agc.run_action_group function to invoke the wave action group.
106 107 108 109 110 111 112 113 | while True: if __isRunning: if start_greet: start_greet = False action_finish = False AGC.runActionGroup('wave') # 识别到人脸时执行的动作(the action performed when a face is recognized) action_finish = True time.sleep(0.5) |
If no face is detected, control the pan-tilt servo to rotate left and right to search for a face.
114 115 116 117 118 119 120 | else: if servo2_pulse > 2000 or servo2_pulse < 1000: d_pulse = -d_pulse servo2_pulse += d_pulse ctl.set_pwm_servo_pulse(2, servo2_pulse, 50) time.sleep(0.05) |
5.10.5 Function Extension
The built-in file is located in /home/pi/TonyPi/ActionGroups.
Program default setting is that TonyPi will execute the greeting action when detect the face. The feedback action can be revised to others such as bowing.
(1) Enter command to the directory where the game program is located.
cd TonyPi/Functions/
(2) Enter command to go into the game program through vi editor.
vim FaceDetect.py
(3) Find code AGC.runActionGroup('wave').
Note
After entering the code position number on the keyboard, press “Shift+G” to directly locate to the corresponding location. This section aims to introduce quick location methods, so the code position number is for reference only. Please rely on actual positions.
wave in the above image is the name of greeting action. If we want to revise the action to bowing, enter “bow” instead of “wave” in the “Action group instruction” in the path /home/pi/TonyPi/ActionGroups.
(4) Press “i” to enter the editing mode, then modify “wave” to “bow”.
(5) Press “Esc” to enter last line command mode. Input :wq to save the file and exit the editor.
:wq