RAW Feature of the image is float?

Question/Issue: The result of edge deployment of the model on ESP32-CAM varies compared to the live classification results on the web. I would like to understand the data type of the RAW Feature of the image and how it’s been interpreted.

Project ID: 136279

Hi @absoluteabutaj,

The raw features of image data is given in RGB888 format. For example, 0xb5d1f6 is:

Red: 0xb5
Green: 0xd1
Blue: 0xf6

Hope that helps!

Thanks for your response @shawn_edgeimpulse

I’ve also noticed that the raw feature for different size of images results in same number of data size.

For example, I uploaded a image of 320x240 and received that raw feature size of 9216 also I did uploaded a image with the size of 96x96 which also results in the same raw feature size.

I would like to understand is it achieved through resizing the image? If so, could you elaborate a bit on this?

Thanks,
Abu.

Hello @absoluteabutaj,

The raw feature will apply a “windowing” for time-series or a “resizing” for images in the very first step of your impulse.
When I looked at your impulse, you set a 96x96 image size in grayscale mode. Thus 9216 features. If you change either the color mode or the size, you should have another value (e.g. 120x120 px in RGB you will get 43200 features).

No matter the size of the image you upload, you need to have a fix length passed to the MobileNet Transfer Learning learning block.

I hope that answers your question,

Best,

Louis

@louis - Your explanation opens up something. On my deployment I’ve resized the image to 96x96 (not squashed) and fed that data to the classifier.

I’ve not converted the image to grayscale. My data is in RGB888 format.

Is it necessary to include the grayscale conversion block on my captured data?

Thanks,
Abu.

Hello @absoluteabutaj,

That’s a good question, I can’t remember if we’ve include the channel conversion in our SDK.

Originally, you had to do the conversion yourself before passing the raw values into the classifier (this applied also for RGB888 to RGB565). But maybe this changed.

@rjames do you know?

Best,

Louis

@rjames - Hello. Do you know?

Thanks,
Abu.

@absoluteabutaj,

In the studio the DSP (image) block takes your image data, resizes and converts to grayscale (or RGB) according to your impulse design. In your case, the image data is resized to 96x96 and converted to grayscale which results in 9216 (96x96x1) features for the learn block.
The NN is then trained with normalized floats or quantized features depending on your choice in the deployment page.

The Studio handles all of this for you. While in the SDK - for image models - there’s some minor prep work you have to do yourself, mainly to prep the input data to beginning part of the pipeline - (input → DSP → NN → output). Further explanation below.

The Image DSP block always expects float pixels where component values are:

  • 0 <= B <= (2^ 8) - 1
  • 2^8 <= G <= (2^16) - 1
  • 2^16 <= R <= (2^24) - 1

In the SDK these image dsp related functions are extract_image_features*() in ei_run_dsp.h.
If your image buffer is not yet resized and/or in this correct pixel format then you’ll have to convert it
yourself. It doesn’t matter whether your sensor’s image is grayscale or RGB565 or RGB888.
Your application should take care of this prep work so that the input is ready for the extract_image_features*() functions.

E.g.:

    uint8_t r, g, b;
    b = a_function_to_get_the_blue_component(resized_image);
    r = a_function_to_get_the_red_component(resized_image);
    g = a_function_to_get_the_green_component(resized_image);
    float pixel_f = (r << 16) + (g << 8) + b;

One way you can do this, is on the fly with the callback assigned to signal.get_data.
See:

The extract_image_features*() will further (depending on the model) extract normalized grayscale/RGB float/quantized features. Which are then input to your NN (see ei_run_classifier.h).

// Raul

@rjames - Thanks for the detailed explanation. I’ll try that out and let you know how it reacts.

Thanks,
Abu.

1 Like

@rjames

TLDR;

Even with the image converted to gray-scale, the results vary drastically on edge implementation compared to live classification.

I did my some prep as suggested, I’m listing those in order,

  1. Capturing image in gray-scale 320x240.
  2. Converting the captured RAW buffer to RGB888.
  3. Resizing it to 96x96.
  4. Passing the data to the classifier.

The snippet for the image captured from the ESP32-CAM (OV2640) in the gray-scale format,

static camera_config_t camera_config = {  
    .xclk_freq_hz = 20000000,    
    .pixel_format = PIXFORMAT_GRAYSCALE,
    .frame_size =   FRAMESIZE_QVGA, 
    .jpeg_quality = 10,
    .fb_count = 2,
}

Once I’ve the RAW buffer, I’m converting them to RGB888 and holding the value in rgb888_matrix

fmt2rgb888(pic->buf,pic->len,pic->format,rgb888_matrix);

After this, 96x96 resizing is achieved using,

image_resizer(resized_matrix,rgb888_matrix,EI_CLASSIFIER_INPUT_WIDTH,EI_CLASSIFIER_INPUT_HEIGHT,3,pic->width,pic->height);

image_resizer() function:

void image_resizer(uint8_t *dst_image, uint8_t *src_image, int dst_w, int dst_h,
                   dst_c, int src_w, int src_h) {
  float scale_x = (float)src_w / dst_w;
  float scale_y = (float)src_h / dst_h;

  int dst_stride = dst_c * dst_w;
  int src_stride = dst_c * src_w;

  if (fabs(scale_x - 2) <= 1e-6 && fabs(scale_y - 2) <= 1e-6) {
    image_zoomIn_twice(dst_image, dst_w, dst_h, dst_c, src_image, src_w, dst_c);
  } else {
    for (int y = 0; y < dst_h; y++) {
      float fy[2];
      fy[0] = (float)((y + 0.5) * scale_y - 0.5);  // y
      int src_y = (int)fy[0];                      // y1
      fy[0] -= src_y;                              // y - y1
      fy[1] = 1 - fy[0];                           // y2 - y
      src_y = DL_IMAGE_MAX(0, src_y);
      src_y = DL_IMAGE_MIN(src_y, src_h - 2);

      for (int x = 0; x < dst_w; x++) {
        float fx[2];
        fx[0] = (float)((x + 0.5) * scale_x - 0.5);  // x
        int src_x = (int)fx[0];                      // x1
        fx[0] -= src_x;                              // x - x1
        if (src_x < 0) {
          fx[0] = 0;
          src_x = 0;
        }
        if (src_x > src_w - 2) {
          fx[0] = 0;
          src_x = src_w - 2;
        }
        fx[1] = 1 - fx[0];  // x2 - x

        for (int c = 0; c < dst_c; c++) {
          dst_image[y * dst_stride + x

 * dst_c + c] = round(
              src_image[src_y * src_stride + src_x * dst_c + c] * fx[1] *
                  fy[1] +
              src_image[src_y * src_stride + (src_x + 1) * dst_c + c] * fx[0] *
                  fy[1] +
              src_image[(src_y + 1) * src_stride + src_x * dst_c + c] * fx[1] *
                  fy[0] +
              src_image[(src_y + 1) * src_stride + (src_x + 1) * dst_c + c] *
                  fx[0] * fy[0]);
        }
      }
    }
  }
}

After this the data is been passed to the classifier run_classifier(&features_signal, &result, false); the produced result is not matching with the live classification.

Is there something I’ve missed in the prep?

Attached are the sample classification results on the edge and the result for the same image on the live classification.

More images follows.

Awaiting your response @rjames

Thanks,
Abu.

Image 2/5

Image 3/5

Image 4/5

Image 5/5

Hello @absoluteabutaj,

Here is an example I wrote some time ago that I think does what you are looking for:

Please note that now we have optimized the hardware acceleration when using esp-nn from ESP-IDF so the Arduino examples are deprecated:

But anyway, the example above can maybe help you.

Best,

Louis

@louis

I’ll have a look at the attached example. Could you elaborate a bit on the optimized hardware acceleration done for ESP-IDF when using esp-nn.

Thanks,
Abu.

@absoluteabutaj,

I’m missing some information from what you posted:

  • the classification results produced on the device
  • the function passed to features_signal.get_data
  • the value assigned to features_signal.total_length

You’re almost there with the steps you provided. If my assumptons are correct, your missing the step of converting your image into RGB floats. @louis’ examples is following your exact steps.
Take his raw_feature_get_data() setup your signal and run your impulse as follows:

  signal_t features_signal;
  features_signal.total_length = EI_CLASSIFIER_INPUT_WIDTH * EI_CLASSIFIER_INPUT_WIDTH;
  features_signal.get_data = &raw_feature_get_data;

  Serial.println("Run classifier...");
  // Feed signal to the classifier
  EI_IMPULSE_ERROR res = run_classifier(&features_signal, &result, false /* debug */);

If you’ve already done this and the classification results are still wrong. I’ll first verify whether you’re producing correct Raw features (the output from raw_feature_get_data())

Where can I find this SDK?

Is the code behind the red Impulse Input Block called Image Data mentioned above available?

@MMarcial,

By SDK I mean edge-impulse-sdk in our library exports. For example C/C++ library (or Arduino library) exports. Here’s a C++ reference: GitHub - edgeimpulse/inferencing-sdk-cpp: Portable C++ library for signal processing and machine learning inferencing

@rjames - If my assumptions are right, the raw features extracted form the the image in the deployment (ESP32-CAM) should match the raw features presented by webUI.

If it is the match, the results will also match, is it?

Note: I’ve implemented float32 and not quantizied int8.