Image - Transfer learning mobilenet scaling input

edge7 · August 12, 2021, 7:44am

Hi,
I have got a question about the default image classification task code that is generated (Python) by Edge Impulse.
Looking at the code, I do not see any scaling of the input data. Mobilenet requires the input (images) to be scaled between -1 and 1. This is the function, I usually call in my notebooks.
BTW it seems that during training somehow the input is scaled because the accuracy grows quickly, I think this is true also when evaluating the model against the test set.
I am not sure if the library you can then download (the Arduino one) runs this preprocessing step automatically, as I see bad performances in real-time and I suspect that the input is not scaled before giving it to the classifier.
Any clue?
Thanks

dansitu · August 12, 2021, 7:01pm

Hi @edge7,

That’s a great question! During training we scale pixel values to between 0 and 1 before they get to the model—you can test this by adding these lines to your Expert Mode code, which will print the max and min pixel values for each image:

for (data, label) in train_dataset.as_numpy_iterator():
    print(np.max(data))
    print(np.min(data))

We should be doing the same in our embedded SDK, but I’ll double check with our embedded team just to make sure!

Warmly,
Dan

janjongboom · August 12, 2021, 7:18pm

Hi, the scaling code on device is here: https://github.com/edgeimpulse/inferencing-sdk-cpp/blob/00ea7959186db936606a5d67ab4c1e0290de6cce/classifier/ei_run_dsp.h#L1340

edge7 · August 12, 2021, 8:30pm

Hi guys,
am trying to understand why cannot run inference successfully through the model generated by your (great) platform, while I have been able to make it work doing everything from scratch (notebook and tensorflow lite)! Am using an ESP32 but that is not important I think.
Let me give you more info and some suggestions.
Each model might have a slightly different preprocessing phase, for instance, the mobile net:
Note: each Keras Application expects a specific kind of input preprocessing. For MobileNetV2, call tf.keras.applications.mobilenet_v2.preprocess_input on your inputs before passing them to the model. mobilenet_v2.preprocess_input will scale input pixels between -1 and 1.
More info here. I think this should be made evident and clear to the user, If a user opens the ‘expert’ mode means he\she has knowledge about what is doing so would probably have this step clear. Also, scaling between 0 to 1 even though might work for most the use cases might not be the best option for all the networks.
This is how am doing that in my notebook:

inputs = tf.keras.Input(shape=(96, 96, 3))
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

where: preprocess_input = tf.keras.applications.mobilenet.preprocess_input

Doing so, the model already includes such a preprocessing step, which is also easy to see in the model summary:

Please note the truediv and subtract operations, (in this case they mean: /255.0 -1) they get also included in the tensorflow lite model itself, so there is no need to add more C++ code for doing that. This decouples C++ from the model itself (hope this makes sense).
Can you also add a model summary picture somewhere? It is easy to include in your UI.

Let me also show you some parts of the C++ project am running in my ESP32 which am sure has good performance as it is working live for 1 month now.
The entry point of the processing part is here:

GetImage(error_reporter, kNumCols, kNumRows, kNumChannels,
input->data.int8)
where input is input = interpreter->input(0);

The most important code is then:

ESP_LOGI(“IMAGE PROVIDER”,);
void *ptrVal = NULL; // create a pointer for memory location to store the data
uint32_t ARRAY_LENGTH = fb->width * fb->height * 3;
printf(“Array Length %d”, ARRAY_LENGTH); // calculate memory required to store the RGB data (i.e. number of pixels in the jpg image x 3)
ptrVal = heap_caps_malloc(ARRAY_LENGTH, MALLOC_CAP_SPIRAM);
if(ptrVal == nullptr)
ESP_LOGE(“IMAGE PROVIDER”, “Unable to allocate”); // allocate memory space for the rgb data
uint8_t *rgb = (uint8_t *)ptrVal; // create the ‘rgb’ array pointer to the allocated memory space

ESP_LOGI(“IMAGE PROVIDER”, “ALlocated”);
// convert the captured jpg image (fb) to rgb data (store in ‘rgb’ array)
bool jpeg_converted = fmt2rgb888(fb->buf, fb->len, PIXFORMAT_JPEG, rgb);
if (!jpeg_converted) ESP_LOGE(“IMAGE PROVIDER”," -error converting image to RGB- ");
ESP_LOGI(“IMAGE PROVIDER”, “Converted”);
int MODEL_IMAGE_WIDTH = 96;
int MODEL_IMAGE_HEIGHT = 96;
int NUM_CHANNELS = 3;
int img_size = MODEL_IMAGE_WIDTH * MODEL_IMAGE_HEIGHT * NUM_CHANNELS;
uint8_t * tmp_buffer = (uint8_t *) malloc(img_size);
image_resize_linear(tmp_buffer,rgb,MODEL_IMAGE_HEIGHT,
MODEL_IMAGE_WIDTH,NUM_CHANNELS,fb->width,fb->height);
ESP_LOGI(“IMAGE PROVIDER”, “COPIO PER MODELLO”);
for (int i = 0; i < img_size; i++) {
image_data[i] = (int8_t) ((int) tmp_buffer[i] - 128);
}

A couple of notes there: am using SPIRAM in ESP32, a useful feature to increase my memory space.
The rest of the code is similar to yours, the only thing is that image_data is expecting int data and not uint so I subtract -128.
Please note my get_input_details for the model I generated:

as you can see RGB 96,96 the quantization is scale 1 (so nothing), offset -128 (which I applied just above).

Then this is the way I get the output:

// Run the model on this input and make sure it succeeds.
if (kTfLiteOk != interpreter->Invoke()) {
TF_LITE_REPORT_ERROR(error_reporter, “Invoke failed.”);
}
TfLiteTensor* output = interpreter->output(0);

// Process the inference results.
int8_t no_gat_score = output->data.int8[0];

and that’s it, it works fine I run image classification fine with my ESP32.
Unfortunately, the EDGE Impulse one always recognises the same class.
Hope this helps.

edge7 · August 12, 2021, 9:01pm

still diving in:
I found (for my project):
#define EI_CLASSIFIER_TFLITE_INPUT_SCALE 0.003921568859368563
#define EI_CLASSIFIER_TFLITE_INPUT_ZEROPOINT -128
not sure why the input scale is so low.
Looking then at this code:

float r = static_cast(pixel >> 16 & 0xff) / 255.0f;
float g = static_cast(pixel >> 8 & 0xff) / 255.0f;
float b = static_cast(pixel & 0xff) / 255.0f;

        if (channel_count == 3) {
            output_matrix->buffer[output_ix++] = static_cast<int8_t>(round(r / EI_CLASSIFIER_TFLITE_INPUT_SCALE) + EI_CLASSIFIER_TFLITE_INPUT_ZEROPOINT);
            output_matrix->buffer[output_ix++] = static_cast<int8_t>(round(g / EI_CLASSIFIER_TFLITE_INPUT_SCALE) + EI_CLASSIFIER_TFLITE_INPUT_ZEROPOINT);
            output_matrix->buffer[output_ix++] = static_cast<int8_t>(round(b / EI_CLASSIFIER_TFLITE_INPUT_SCALE) + EI_CLASSIFIER_TFLITE_INPUT_ZEROPOINT);
        }

r,g,b should be after your manual scaling between 0.0 and 1.0.
after applying the classifier input scale they should become between 0 and 255 again (1/0.003921568859368563 is circa 255). and then finally adding the zero point, they should be between -128 and 127, which is my starting point. The difference is that, as I shown above, I have got the actual scaling required by my model embedded in my model itself via tensorflow operations…