Issue with Filling Input Buffer for Edge Impulse Classifier in micro-ROS Node

hizzawaseem · October 16, 2024, 6:27pm

Question/Issue:
Hello,

I am currently working on deploying an Edge Impulse model within a micro-ROS node on a Linux machine
Ubuntu 22.04, which will later be deployed on a Renesas RA6M5 MCU. I have successfully tested the standalone inferencing example, where the model correctly processes image data. However, I am encountering issues with correctly filling the input buffer to pass to the classifier in my micro-ROS node setup.

I want to use the built in image processing functions provided by the C++ SDK. I have been using the following example to resize the image before sending it to run_classifier function. However I recieve a segmentation fault in the function cropImage(…). I am using the following code as example: GitHub - edgeimpulse/example-resize-image: Example of using SDK to resize images before run_classifier

Project ID:
439843

Context/Use case:

Micro-ROS Node: Deployed on a Linux machine Ubuntu 22.04, communicates with another ROS node via a micro-ROS agent.
Hardware Target: The final deployment will be on a Renesas RA6M5 MCU.
Video Source: The video frames are published by a ROS node, captured via OpenCV, and converted to ROS sensor_msgs/msg/Image messages.

Steps Taken:

Implemented a micro-ROS node that subscribes to video frames and attempts to convert the frame data into an input buffer for the Edge Impulse classifier.
Used OpenCV on the ROS node to capture video, convert frames to RGB format, and publish them as sensor_msgs/msg/Image messages.
Developed a callback function in the micro-ROS node to process the incoming frames and prepare the input buffer for classification.

Expected Outcome:
The incoming frames should be processed correctly, and the input buffer should match the expected format and size

Actual Outcome:

Reproducibility:
Always

MicroROS code:

#include <rcl/rcl.h>
#include <rclc/rclc.h>
#include <rclc/executor.h>
#include <std_msgs/msg/string.h>
#include <sensor_msgs/msg/image.h>
#include "edge-impulse-sdk/dsp/image/processing.hpp"
#include "edge-impulse-sdk/classifier/ei_run_classifier.h"
#include <stdint.h>
#include <stdio.h>

// Raw features
static uint8_t resized_image[EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_HEIGHT * 3] = {0};

// ROS publisher
rcl_publisher_t inference_publisher;

// Function to convert image data to float array for classifier
static int get_signal_data(size_t offset, size_t length, float *out_ptr);

// Forward declarations
void image_callback(const void *msgin);
void publish_classification_result(rcl_publisher_t *publisher, const char *result);

using namespace ei::image::processing;

int main(int argc, char **argv) {
    // Initialize ROS 2
    rcl_ret_t rc;
    rcl_allocator_t allocator = rcl_get_default_allocator();
    rclc_support_t support;
    rcl_node_t node;
    rcl_subscription_t image_subscription;
    sensor_msgs__msg__Image image_message; // Declare an instance of the message type

    rc = rclc_support_init(&support, argc, argv, &allocator);
    rc += rclc_node_init_default(&node, "tack_classifier_node", "", &support);

    // Initialize publisher
    rc += rclc_publisher_init_default(
        &inference_publisher,
        &node,
        ROSIDL_GET_MSG_TYPE_SUPPORT(std_msgs, msg, String),
        "inference_stream");

    // Initialize subscriber for images
    rc += rclc_subscription_init_default(
        &image_subscription,
        &node,
        ROSIDL_GET_MSG_TYPE_SUPPORT(sensor_msgs, msg, Image),
        "video_frames");

    // Initialize executor
    rclc_executor_t executor;
    rclc_executor_init(&executor, &support.context, 1, &allocator);
    rclc_executor_add_subscription(
        &executor,
        &image_subscription,
        &image_message, // Pass the instance of message type
        image_callback, // Changed to function pointer
        ON_NEW_DATA);

    // Spin the executor (equivalent to rclpy.spin)
    rclc_executor_spin(&executor);

    // Clean up resources
    rcl_subscription_fini(&image_subscription, &node);
    rcl_publisher_fini(&inference_publisher, &node);
    rclc_executor_fini(&executor);
    rcl_node_fini(&node);
    rclc_support_fini(&support);

    return 0;
}

// Image callback function to process video frames
void image_callback(const void *msgin) {
    EI_IMPULSE_ERROR res;           // Return code from inference

    printf("check 1\n");
    const sensor_msgs__msg__Image *image_msg = (const sensor_msgs__msg__Image *)msgin;

    // Check encoding (assuming "rgb8" format)
    if (strcmp(image_msg->encoding.data, "rgb8") != 0) {
        printf("Unsupported image encoding: %s\n", image_msg->encoding.data);
        return;
    }

    printf("check 2\n");

    // Resize the image
    int res_int = ei::image::processing::resize_image_using_mode(
        image_msg->data.data,
        image_msg->width,
        image_msg->height,
        resized_image,
        EI_CLASSIFIER_INPUT_WIDTH,
        EI_CLASSIFIER_INPUT_HEIGHT,
        3,  // RGB888
        EI_CLASSIFIER_RESIZE_MODE);

    printf("check 3\n");
    
    if (res_int != 0) {
        printf("Error resizing image: %d\n", res);
        return;
    }



    // Prepare the signal
    signal_t signal = {
        .get_data = &get_signal_data,  // Initialize 'get_data' first
        .total_length = EI_CLASSIFIER_DSP_INPUT_FRAME_SIZE  // Initialize 'total_length' second
    };

    printf("check 4\n");
    ei_impulse_result_t result = {0};

    // Classify image using Edge Impulse
    res = run_classifier(&signal, &result, false);

    printf("check 5\n");
 
    if (res == EI_IMPULSE_OK) {
        // Process results and create a classification string
        char result_str[1024];
        snprintf(result_str, sizeof(result_str), "Detected: %s (%.2f)", result.classification[0].label, result.classification[0].value);

        // Publish the result
        publish_classification_result(&inference_publisher, result_str);
    } 
    else 
        printf("Error classifying the image\n");

    printf("check 6\n");
    
}

// Callback: fill a section of the out_ptr buffer when requested
static int get_signal_data(size_t offset, size_t length, float *out_ptr) {

    // we already have a RGB888 buffer, so recalculate offset into pixel index

    size_t pixel_ix = offset * 3;
    size_t out_ptr_ix = 0;

    while (length != 0) {
        // pixels are written into the mantissa of the float
        out_ptr[out_ptr_ix] = (resized_image[pixel_ix] << 16) +
                              (resized_image[pixel_ix + 1] << 8) +
                              resized_image[pixel_ix + 2];
        
        // go to the next pixel
        out_ptr_ix++;
        pixel_ix += 3;
        length--;
    }

    return EIDSP_OK;
}

// Function to publish classification result
void publish_classification_result(rcl_publisher_t *publisher, const char *result) {
    std_msgs__msg__String msg;
    size_t result_length = strlen(result);
    msg.data.data = (char *)malloc(result_length + 1); // Allocate memory for the string
    if (msg.data.data == NULL) {
        printf("Failed to allocate memory for message data\n");
        return;
    }

    strcpy(msg.data.data, result); // Copy the string into the allocated memory
    msg.data.size = result_length;
    msg.data.capacity = result_length + 1;
    
    rcl_publish(publisher, &msg, NULL);

    // Free allocated memory after publishing
    free(msg.data.data);
}

**Publisher Code: **

import rclpy
from rclpy.node import Node
import cv2
from cv_bridge import CvBridge
from sensor_msgs.msg import Image

class VideoPublisher(Node):
    def __init__(self):
        super().__init__('video_publisher')
        
        # Declare the video path parameter
        self.declare_parameter('video_path', '')

        # Wait until a valid video path is provided
        self.video_path = self.get_parameter('video_path').get_parameter_value().string_value
        while not self.video_path:
            self.get_logger().info('Waiting for video path to be provided...')
            rclpy.spin_once(self)
            self.video_path = self.get_parameter('video_path').get_parameter_value().string_value

        # Initialize the video capture with the provided path
        self.cap = cv2.VideoCapture(self.video_path)
        if not self.cap.isOpened():
            self.get_logger().error(f"Could not open video file: {self.video_path}")
            rclpy.shutdown()
            return
        
        # Create a publisher
        self.publisher_ = self.create_publisher(Image, 'video_frames', 10)
        
        # Initialize the CvBridge
        self.bridge = CvBridge()

        # Set the timer period for 30 FPS
        timer_period = 1.0 / 30.0  # 30 FPS
        self.timer = self.create_timer(timer_period, self.timer_callback)

    def timer_callback(self):
        # Capture a frame from the video
        ret, frame = self.cap.read()
        
        if not ret:
            self.get_logger().info('Video stream ended.')
            self.cap.release()
            rclpy.shutdown()
            return

        # Convert the OpenCV BGR image to RGB
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

        self.get_logger().info('Converted to RGB')

        # Display the frame that is being published
        cv2.imshow("Published Frame", rgb_frame)
        cv2.waitKey(1)  # Allows the image to be displayed properly

        # Convert the RGB image to a ROS Image message
        image_message = self.bridge.cv2_to_imgmsg(rgb_frame, encoding='rgb8')
        
        # Publish the Image message
        self.publisher_.publish(image_message)
        
        self.get_logger().info('Publishing video frame')

def main(args=None):
    rclpy.init(args=args)
    video_publisher = VideoPublisher()
    rclpy.spin(video_publisher)
    video_publisher.destroy_node()
    rclpy.shutdown()
    # Close OpenCV windows
    cv2.destroyAllWindows()

if __name__ == '__main__':
    main()

Additional Information:

How can I correctly convert and format the incoming image data from the sensor_msgs/msg/Image to match the expected input buffer format for the Edge Impulse classifier?
Is there a more efficient way to handle RGB image data and convert it to the required format for inference?

Eoin · October 22, 2024, 1:16pm

Very nice @hizzawaseem!

Although I’m not up to speed with Micro-ros, I have found a demo here → Edge Impulse Demo | micro-ROS

Do you have some doc you have followed for this, or is this helpful?

@ThomasVikstrom perhaps you, @mateusz, or @marcolevrero can jump in here?

Best

Eoin

ThomasVikstrom · October 22, 2024, 2:54pm

I’ve not used ROS or Micro-ROS, but have with Python on Windows and a web-camera done classification and sorting of 3D-printed parts as part of an AI-course. This is the program I’ve used for classification, check the function preprocess for details about resizing.

import numpy as np                                                      # Import the NumPy library for numerical operations.
import cv2                                                              # Import the OpenCV library for image processing.
import tensorflow as tf                                                 # Import the TensorFlow library for ML models.
import time


# Constants
CAM_PORT = 0                                                            # Define the camera port (default is usually 0).
MODEL_PATH = "ei-sort_defects-transfer-learning-tensorflow-lite-float32-model (5).lite"   # Model file path.
LABELS = ("circle dirty", "circle ok", "nothing", "square dirty", "square ok", "triangle dirty", "triangle ok")  # Define labels.

def initialize_camera(port=CAM_PORT):                                   # Function to initialize the camera.
    # Initialize the camera 
    return cv2.VideoCapture(port, cv2.CAP_DSHOW)                        # Return the camera capture object.

def load_tflite_model(model_path=MODEL_PATH):                           # Function to load a TensorFlow Lite model.
    # Load the TensorFlow Lite model 
    interpreter = tf.lite.Interpreter(model_path=model_path)            # Load TFLite model.
    interpreter.allocate_tensors()                                      # Allocate tensors for the interpreter.
    return interpreter                                                  # Return the model interpreter.

def capture_image(camera):                                              # Function to capture an image from the camera.
    # Capture an image from the camera 
    ret, frame = camera.read()                                          # Read a frame from the camera.
    if not ret:                                                         # If frame reading failed,
        raise RuntimeError("Failed to capture image")                   # raise a RuntimeError.
    return frame                                                        # Return the captured frame.

def preprocess(frame, alpha=1, beta=1):                                 # Function to preprocess the frame for prediction.
    # Preprocess the frame for prediction 
    
    brightened_frame = cv2.convertScaleAbs(frame, alpha=alpha, beta=beta)
    cv2.imshow("Preprocessed Frame", brightened_frame)                  # Show preprocessed frame.
    processed = cv2.convertScaleAbs(brightened_frame)                   # Apply scale conversion to frame.
    processed = cv2.resize(processed, (160, 160))                       # Resize frame to model's expected size.
    processed = processed / 255.0                                       # Normalize pixel values to [0, 1].
    processed = np.expand_dims(processed, axis=0).astype(np.float32)    # Add batch dimension, ensure type is float32.
    return processed                                                    # Return the preprocessed frame.

def predict(interpreter, image):                                        # Function to perform prediction using the TFLite model.
    # Make a prediction using the TensorFlow Lite model 
    input_details = interpreter.get_input_details()                     # Get model input details.
    output_details = interpreter.get_output_details()                   # Get model output details.
    interpreter.set_tensor(input_details[0]['index'], image)            # Set input tensor.
    interpreter.invoke()                                                # Run inference.
    output_data = interpreter.get_tensor(output_details[0]['index'])    # Get output data.
    return output_data                                                  # Return the prediction results.

def process_image(camera):
    frame = capture_image(camera)
    preprocessed_frame = preprocess(frame)                              # Preprocess captured frame.
    return frame, preprocessed_frame

def return_prediction(camera, model):
    # frame = capture_image(camera)                                     # Captures a frame.
    # preprocessed_frame = preprocess(frame)                            # Preprocess captured frame.
    
    frame, preprocessed_frame = process_image(camera)
    # cv2.imshow("Preprocessed Frame", frame)                           # Show preprocessed frame.
    output = predict(model, preprocessed_frame)                         # Perform inference of preprocessed frame.
    predicted_label = LABELS[np.argmax(output)]                         # Fetch classification.
    return predicted_label


def main():                                                             # Main function.
    camera = initialize_camera()                                        # Initialize camera.
    model = load_tflite_model()                                         # Load the TensorFlow Lite model.
    
    
    while True:                                                         # Start an infinite loop.
        try:
            predicted_label = return_prediction(camera, model)
            print("Predicted label:", predicted_label)                  # Print the predicted label.
            
            if predicted_label == 'square ok':
                print("Found: ", predicted_label)
                
            if cv2.waitKey(1) & 0xFF == 27:                             # If ESC key is pressed,
                break                                                   # exit the loop.

        except RuntimeError as e:                                       # If a RuntimeError occurs,
            print(e)                                                    # print the error.
            break                                                       # and break the loop.

    camera.release()                                                    # Release the camera.
    cv2.destroyAllWindows()                                             # Close all OpenCV windows.

if __name__ == "__main__":                                              # If this script is run as the main program,
    main()                                                              # call the main function.

In addition, I asked ChatGPT for guidance by copy-pasting your post (including screenshot), its reply is below, not sure if it helps in this case

------------------------------------------------ ChatGPT ----------------------------------------------------

The error occurs after printing the dimensions of the source (srcWidth, srcHeight), destination (dstWidth, dstHeight), and crop parameters. Based on the values printed:

srcWidth: 1440, srcHeight: 1080
dstWidth: 48, dstHeight: 48
cropWidth: 1080, cropHeight: 1080

It seems the function is attempting to resize from a relatively large image (1440x1080) down to a much smaller size (48x48). Additionally, cropWidth and cropHeight are both set to 1080, which could be causing the problem, as the cropImage function likely doesn’t handle this setup correctly, leading to memory access issues.

Possible Causes and Fixes

Mismatched Cropping and Resizing Dimensions:
The crop size (1080x1080) is larger than the destination size (48x48), which could cause issues when the crop function tries to access out-of-bounds memory. You may need to adjust the cropping size or make sure that the resizing function is properly reducing the source image before cropping.

Try adjusting the crop size to match the destination size:
```
cropWidth = 48;
cropHeight = 48;
```
Or skip the crop step altogether if it’s not required for your use case.
Verify Buffer Sizes:
Double-check that the buffer resized_image has enough space for the resized data. Since you’re resizing to 48x48 RGB images, ensure the buffer size is calculated correctly. In your case, the buffer should be:
```
static uint8_t resized_image[48 * 48 * 3] = {0};  // 48x48 pixels, 3 channels (RGB)
```
Use OpenCV Resizing:
If you are using OpenCV, you can use the cv::resize function to handle resizing more reliably, avoiding potential issues with custom resizing code. In the publisher code, you can resize the image before sending it to the subscriber node:
```
frame_resized = cv2.resize(frame, (48, 48))
```
This way, the resizing is already done when the frame is received by the micro-ROS node, reducing the need for complex image resizing within the micro-ROS environment.

Next Steps:

Adjust the crop dimensions to match the destination size or avoid cropping if not necessary.
Double-check buffer sizes to ensure they match the resized image dimensions (48x48 in this case).
Consider using OpenCV resizing on the publisher side to reduce processing complexity in your micro-ROS node.

------------------------------------------------ ChatGPT ----------------------------------------------------

hizzawaseem · November 8, 2024, 3:40pm

Hi I found a soultion to the problem. I used the fit longest axis resize option and the used the following python script to extract raw features and published it as a float array to the MicroROS node.

    def process_frame(self, frame):
        # Display the original frame using OpenCV
        # cv2.imshow('Original Frame', frame)

        # Convert frame to grayscale and resize to fit longest axis
        img = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY))
        target_width = EI_CLASSIFIER_INPUT_WIDTH
        target_height = EI_CLASSIFIER_INPUT_HEIGHT
        aspect_ratio = img.width / img.height

        if aspect_ratio > 1:  # Landscape orientation
            new_width = target_width
            new_height = int(target_width / aspect_ratio)
        else:  # Portrait orientation or square
            new_height = target_height
            new_width = int(target_height * aspect_ratio)

        # Resize the image using the calculated dimensions
        img_resized = img.resize((new_width, new_height), Image.BILINEAR)
        # Create a new image with the target size and background color
        img_padded = Image.new("L", (target_width, target_height))
        # Calculate padding to center the resized image
        pad_left = (target_width - new_width) // 2
        pad_top = (target_height - new_height) // 2
        # Paste the resized image onto the padded background
        img_padded.paste(img_resized, (pad_left, pad_top))
        
        # Create a single RGB hex element in the format 0xRRGGBB
        img_array = np.array(img_padded).flatten()
        self.hex_float_array = [(float((p << 16) + (p << 8) + p)) for p in img_array]  # RGB format in hex to float

        if debug_enabled:
            # Print the array in 0xRRGGBB format for verification
            print("Float Array in 0xRRGGBB format (hexadecimal representation):")
            hex_array = [hex(int(f)) for f in self.hex_float_array]
            print(", ".join(hex_array))
    
        self.publish_image_data()

    def publish_image_data(self):
        # Create Float32MultiArray message
        msg = Float32MultiArray()
        msg.data = self.hex_float_array
        self.publisher_.publish(msg)
        self.get_logger().info('Publishing float array of image data')