ESP32 CAM Support

hi I’m trying the tutorial on github, for classification with esp32cam. I am trying to classify 3 items from the fashion mnist database. The model on EI is fine, but when I deploy it to the device, the inference is always 99% on the same label.
I tried to not quantize in the optimization choice but I get this:
" failed to allocate tensor arena
Failed to allocate TFLite arena (error code 1)
run_classifier returned: -6"

Any tips for me?

Hi @gransasso This is a memory allocation issue, we cannot allocate enough memory to run the neural network. Using a smaller transfer learning model / smaller neural network will resolve this.

@louis Can you not show any conclusions if run_classifier returns an error? That would help with perception.


Indeed the RAM usage for the unoptimized model is too big to fit on the ESP32 cam.
Let me try to run your model on my ESP32 if you don’t mind. I will see if I have the same issue where “the inference is always 99% on the same label.”

I let you know

hope you understand italian labels :grinning:

I have the same classification each time I run the inference too.
Let me double check the code especially the resize function to see if anything seems weird…

I’ve just noticed that you are using greyscale when I used RGB.
The issue might comes from there.

I’ll check again later today if I can have better results.


Hello @gransasso,

I see that you just fixed your images to RGB:

I just ran your model with my ESP32 cam and it seems to work now:

Let me know if it worked on your side.



really? maybe my esp is broken. I reduced the categories to just two so as to simplify, but I always keep getting the sneaker category with a confidence of at least 85

I have noticed that your images are not as small as mine

The size of the image is just a bug in the display, if you use chrome and you reduce the size of the window, the image should get bigger at some point :joy:

Can you try to place a sneaker with a dark background and try again? I suspect your model to perform well with the provided images by the fashion mnist database but not really in real conditions…

Also, what you can do to test if you have the same results between your ESP classification and the one on the studio is:

  • Save your image
  • Upload it on the studio (either with the cli or directly using the online uploader)
  • Compare the results

Then you will see if the issue comes from the embedded code or the model.



yeah, i tried with a shirt on a black background and it seems to be better. In fact, I thought from the beginning it could be such a problem.


Hey Louis, i have found this on :

At this point, you are properly wondering if the model we just trained on the Fashion MNIST dataset would be directly applicable to images outside the Fashion MNIST dataset?

The short answer is “No, unfortunately not.”

The longer answer requires a bit of explanation.

To start, keep in mind that the Fashion MNIST dataset is meant to be a drop-in replacement for the MNIST dataset, implying that our images have already been processed.

Each image has been:

  • Converted to grayscale.
  • Segmented, such that all background pixels are black and all foreground pixels are some gray, non-black pixel intensity.
  • Resized to 28×28 pixels.

For real-world fashion and clothing images, you would have to preprocess your data in the same manner as the Fashion MNIST dataset.

And furthermore, even if you could preprocess your dataset in the exact same manner, the model still might not be transferable to real-world images.

So i choose a bad dataset, sorry for wasting your time :sweat_smile:

it seems I managed to figure out what the problem was.
I used a very small pre-trained model, so I was able to avoid quantizing the model during deployment.
It seems that quantization caused problems in the accuracy of the model after deployment.

1 Like

@louis @janjongboom & all others, Thanks a lot for this entire thread. I had the same issues and resolved almost everything with your help.


@gransasso Yeah we have some ideas on fixing quantization error on smaller image models (such as using a different dataset to find quantization parameters, and quantization-aware training) that @dansitu is working on.

1 Like

First, @Louis thanks for the work you shared, for the ESP32 cam!
I’m a teacher and rather new to ML, but I hope to bring ML to my classes.
The ESP32 cam would be a great tool as our budget is rather small and our students are only familiar with the Arduino environment.
To get familiar with it, I trained a model for classifying nuts as pistachio nuts or as brazil nuts and tried to keep it as small as possible. When I test the model in EI, it gives normal results, but when I run inference on the ESP32 cam, I always get the same result: 0.5000 vs 0.5000
I tried to run the model with both the basic and the advanced image classification example. I tried it on another ESP32 cam, but always with the same result.
Any idea what’s going wrong? Thanks a lot!


Hello @Bewegendbeeld,

Could you try to add an “unknown” class to see if it changes something?
Also I would try the MobileNet v1 with a larger image size (maybe 96x96). You might obtain better results with this model architecture.



1 Like

Thanks, @louis!
I tried with a larger image size, but this gives this error:
ERR: failed to allocate tensor arena
Failed to allocate TFLite arena (error code 1)
Maybe because of the limited memory of the ESP32?
So I went back to image size 48x48.
I also added an ‘else’ class with some random objects and trained again.
When doing live classification on Edge Impulse, everything seems to work as expected, but when I run inference on the ESP32 cam, with basic example I have different results. Pistache remains 0.00000, and every images is classified as ‘else’.
When I try the advanced example, I get 100% else and the other categories remain 0.0000.

Any idea what’s going wrong?
Thanks a lot!

Hello @Bewegendbeeld,

ERR: failed to allocate tensor arena Failed to allocate TFLite arena (error code 1) is indeed a memory issue.
Were you trying with larger images using the MobileNet v1 or v2?
v2 with 96x96 is definitely too big but on v1 it should fit.

Depending on which camera / esp version you are using, some boards have the possibility to set the image size to config.frame_size = FRAMESIZE_96X96;

In this case you won’t need to resize your image using the image_resize_linear(ei_buf, out_buf, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT, 3, out_width, out_height); and I think the issue comes from this function in the advanced example and from the cutout_get_data in the basic example.



1 Like

Merci, @louis!
Indeed, I was trying with MobileNetV2. When I change to V1, I’m able to use 96X96. After this I tried again with the new Arduino library.
In the advanced example, I changed the lines with FRAMESIZE_240X240 to FRAMESIZE_96X96
And I removed:
image_resize_linear(ei_buf, out_buf, EI_CLASSIFIER_INPUT_WIDTH, EI_CLASSIFIER_INPUT_HEIGHT, 3, out_width, out_height);
But even then I always get the same classification.

I noticed the output image after running inference remains black, which isn’t the case when streaming.

Any idea where something goes wrong?