Image classification results drastically varies in deployment


Whenever the image gets captured and passed on for the inference, the results of the edge inferenced image drastically varies when running the same image for inference on the web.

My deployment target is ESP32-CAMERA (Ai-Thinker) which is running FreeRTOS (ESP-IDF).

I’ve followed this example from EI: GitHub - edgeimpulse/firmware-espressif-esp32: Edge Impulse firmware for the Espressif ESP-EYE(ESP32) Development board

Did some additional modification to store the captured image to SDCard so that I can upload it to web for inferencing.

The results varies drastically for every single images.

What I’m missing?

Image Examples: Captured Image


Predicted Result on the device:
Predictions (DSP: 10 ms., Classification: 152 ms., Anomaly: 0 ms.):
AsianElephant: 0.02734
Human: 0.67188
Random: 0.29688

Live classifcation result is in the following thread :thread:

Note: If more images are required, I’m happy to supply.

Project ID: 134604

Really looking forward to get it clarified.

Thanks in advance


Here is the live classification result from web

ESP32-CAMERA classifed image

AsianElephant: 0.00
Human: 1.00
Random: 0.00


Result for the same image on web live classification

Some more images of the classification.

Device predicted result
AsianElephant: 0.00
Human: 0.55
Random: 0.45

Hello @absoluteabutaj_proto,

Could you try to run the standalone example: On your Espressif ESP-EYE (ESP32) development board - Edge Impulse Documentation (both the float32 and the quantized model).
The results of the float32 should match the live classification.

If the results are the same for the float32 model but not for the quantized model, you can read this section:

If they are the same, it probably come from your camera. Are the data used for the training been collected with the camera on your device?



Thanks for your response @louis

The images used to train the model are not only from the ESP32-CAMERA, but also from some other camera sources like the mobile phones, and open images.

Does it really matter to train the model with the target devices’ camera?

Also, a lot of the images are from trail cameras that are deployed with the devices.

Does it really matter to train the model with the target devices’ camera?

Haven’t seen a response to this yet and it wouldn’t make sense that the images need to come from the camera only as this would imply that using images from the extensive collection of images available from different sources are worthless.

Hello @delfin4, @absoluteabutaj_proto

It should not matter much if you have a good dataset.
If you have a close look at this image:

You can notice a small line on the upper part of the image. If you have data samples in your dataset of human containing that small line, your NN will likely learn that feature.

Same apply for an image that is too bright or too dark, etc… Those are features a NN can learn.