Problems/code error with the built in case of the FOMO, RGB image and ESP-EYE

Dear Edge Impulse Experts,

I have problems, and it seems I have found a bug, with the Use Case when I try to detect the objects of the same shape but different color.
(If it is about grey-scale and objects with different shapes then it works.)

Use Case

  • I teach the FOMO (Faster Objects, More Objects) MobileNetV2 0.1
  • Color depth RGB
  • Objects: same shape (e.g. simple cylinders) but in different colors
  • In the Desktop, Launch in Browser in Edge Impulse (or with QR code) it works perfectly, the neural net can detect the same shape objects but in different colors.
    So the neural net works perfectly.
  • But the generated built SW by Edge Impulse does not work on ESP-EYE microcontroller in case of RGB setting. I tried different built settings and I have got different issues.

With different settings:

  • In the Deployment section
  1. Set/Unset Enabled EON Compiler, Quantized (int8), RGB →
    it can be built and run on the ESP-EYE but the object detection does not work at all.
    Always No object detected.
    (I think something goes wrong during the quantization. E.g. float->int8 conversion )

  2. Set/Unset Enabled EON Compiler, Unoptimized (float32), RGB →
    Build error in the Edge Impulse generated code:

…\src\edge-impulse-sdk\tensorflow\lite\micro\micro_graph.cpp" -o "
…\src\edge-impulse-sdk\tensorflow\lite\micro\kernels*softmax.cpp*: In function ‘void tflite::{anonymous}::SoftmaxQuantized(TfLiteContext*, const TfLiteEvalTensor*, TfLiteEvalTensor*, const tflite::{anonymous}::NodeData*)’:
…\src\edge-impulse-sdk\tensorflow\lite\micro\kernels\softmax.cpp:301:14: error: return-statement with a value, in function returning ‘void’ [-fpermissive]
return kTfLiteError;

  • I only used the generated codes (.zip library, and the template file in Arduino IDE) during this issues.

My questions are as follows:

  • Is the RGB Tiny ML code build not supported for the ESP-EYE yet?
  • Why do I get this code error in the generated code above? Maybe I forgot to set something.

Best regards,

Hello @edghba,

Could you try with the Arduino Library deployment type and let me know if you also have issues.
I’ll try to have a look at your project in parallel with the ESP-EYE binary deployment type, can you share your project ID?



Hello Iouis,

ID: 289375.
I used the Arduino type:

Or maybe are you thinking of something else?

Best regards,

Hello @edghba,

Just tested with the quantized model (float32 might be too big).
I don’t have an ESP-EYE but I tested with an ESPCAM AI Thinker.

Here’s the results when I point the camera to a picture from your test dataset:

{Image removed}

Can you save your pictures on an SD card, upload them somewhere or display the images with the same config to make sure your images actually look like expected?



Hello Louis,

Thank you for your testing and your support. It looks good/works on your snapshot.
It is a good idea to test it on a original image. I did it but No objects found.
(Small remark: The PC version worked with image also. So the algorithm is OK.)
I also use the quantized model.
I suspect there is an important difference in the generated code/ (lib) between ESP-EYE and ESPCAM AI Thinker.
Is the only difference in the GPIO assignment and in the HW description or is there in the algorithm also?
In my case the #define CAMERA_MODEL_ESP_EYE is active but in your case the CAMERA_MODEL_AI_THINKER.
What is very strange that I do not have this issue when I use gray scale images.
But I can not use gray scale images when the only differences are in the colors.

I only use your Edge Impulse lib and the template example (esp32_camera.ino).
Is it not possible that there exists difference in the algorithm in the lib in the 2 cases?
(Maybe one define is missing for ESP-EYE somewhere. I am just thinking about.)

  • ESPCAM AI Thinker+RGB

Best regards,

What camera you have on your ESP-EYE?

There might be a setting that is not correctly configured…
You can try to modify some parameters in this function:

static camera_config_t camera_config = {
    .pin_pwdn = PWDN_GPIO_NUM,
    .pin_reset = RESET_GPIO_NUM,
    .pin_xclk = XCLK_GPIO_NUM,
    .pin_sscb_sda = SIOD_GPIO_NUM,
    .pin_sscb_scl = SIOC_GPIO_NUM,

    .pin_d7 = Y9_GPIO_NUM,
    .pin_d6 = Y8_GPIO_NUM,
    .pin_d5 = Y7_GPIO_NUM,
    .pin_d4 = Y6_GPIO_NUM,
    .pin_d3 = Y5_GPIO_NUM,
    .pin_d2 = Y4_GPIO_NUM,
    .pin_d1 = Y3_GPIO_NUM,
    .pin_d0 = Y2_GPIO_NUM,
    .pin_vsync = VSYNC_GPIO_NUM,
    .pin_href = HREF_GPIO_NUM,
    .pin_pclk = PCLK_GPIO_NUM,

    //XCLK 20MHz or 10MHz for OV2640 double FPS (Experimental)
    .xclk_freq_hz = 20000000,
    .ledc_timer = LEDC_TIMER_0,
    .ledc_channel = LEDC_CHANNEL_0,

    .pixel_format = PIXFORMAT_JPEG, //YUV422,GRAYSCALE,RGB565,JPEG
    .frame_size = FRAMESIZE_QVGA,    //QQVGA-UXGA Do not use sizes above QVGA when not JPEG

    .jpeg_quality = 12, //0-63 lower number means higher quality
    .fb_count = 1,       //if more than one, i2s runs in continuous mode. Use only with JPEG
    .fb_location = CAMERA_FB_IN_PSRAM,
    .grab_mode = CAMERA_GRAB_WHEN_EMPTY,

Or you can also try to flip the frames, adjust the brightness, saturation with something like that:

sensor_t * s = esp_camera_sensor_get();
s->set_vflip(s, 1); // flip
s->set_hmirror(s, 1); // mirror
s->set_brightness(s, 1); // up the brightness just a bit
s->set_saturation(s, -2); // lower the saturation

How have you collected the images? Else, you can try to use your current camera settings to collect the images and train your model.



Hello Louis,

Thank you for your proposals.

  • Image sensor: OV3660.
  • Ok, I look at these settings also. I know them from another camera application where I used them.
  • I collected the images with mobile phone.
  • Yes, I know that I can use the ESP-EYE for collecting also. I have also thought about that.

Best regards,

@edghba Make your Project public and I will clone it and run on my ESP-EYE MCU v2.2.

Hello MMarcial,

Thank you very much for your quick and kind offer/help with your ESP-EYE.
Unfortunately, I can not make public the source images/objects.

Now I try to manage an ESPCAM AI Thinker which Lois has and I try it with them.

Best regards,

1 Like

Hello Louis,

I tried out lot of things (different light conditions, collecting images with ESP-EYE using your proposed method in Collect Sensor Data Straight From Your Web Browser, setting saturation, rotating the camera, etc.) and I also managed an ESPCAM AI Thinker but I have the same issue. Actually they (EYE and AI-Thinker also) can not detect the objects which differ only in color. In the prototype testing in the Edge Impulse it works but the quantized model does not work on ESP.
Sometimes in 1-2% of the cases ESP can detect correctly.
Could you share the version (.zip) in e-mail, which works at you perfectly, with me please?
I would test, I already have the same HW, as you and maybe I set something wrong in the Edge Impulse when I create the library.
I use this net: FOMO (Faster Objects, More Objects) MobileNetV2 0.1

Best regards,

Hello @edghba,

I tried with the default example, I just changed the

Please note that it was not working perfectly, I had like 10% of the frames that contained a bounding box.

@matkelcey, have you encountered this kind of color limitation with FOMO? Is there any tip to make the model better and separate better on colors?



Hello Louis,

Thank you for the information. Then we see the same issue.
I think the problem is at the quantized, Tensorflow Lite model created by Edge Impulse because the neural net works on PC in Edge Impulse well. It detects the objects different only in colors perfectly.
Some accuracy degradation should be on ESP because of the int instead of float etc. but actually unfortunately it does not work on ESP.

Best regards,

Hello Louis,

Short summary and a question.
After lot of different training (RGB, grey-scale, with more, less training data, at different light conditions and in different places, with MobileNetV2 0.1/0.35) and testing on ESP-EYE and ESP32 Ai-Thinker I found that:

  • The FOMO (Faster Objects, More Objects) MobileNetV2 0.35, so when the alpha = 0.35, works in grey-scale very robustly when I train it with lot of data (for 2 different objects with a total of 800 training images in different positions).

  • ESP-EYE detects the object very reliably but since it is trained in grey-scale it can not distinguish among the objects of exact same shape but different colors.

  • I have found in the Edge Impulse description also:
    FOMO (Faster Objects, More Objects) MobileNetV2 0.35These models are designed to be <100KB in size and support a grayscale input at any resolution.

  • I have not found such neural net on Edge Impulse site which works with colorful objects.

  • The MobileNetV2 algorithm and your FOMO (Faster Objects, More Objects) MobileNetV2 0.35 algorithm works for colorful objects also. Edge Impulse proves also because it works on PC or on mobile phone perfectly on your site in the Launch in Browser section.

  • I wonder why it can not work after the quantization at all?
    I am thinking about that it should work after quantization also. I think it is not issue of the light conditions and of the camera settings, because I also tested at different conditions, and because the quantized grey-scale model works in different light conditions also.

Could you investigate/look at, please why we have this issue for colorful objects but not for grey-scaled objects?
The MobileNetV2 object detection algorithm is good for colorful objects also.

If this issue could be clarified and solved then it would be really great for me.

Project Id for this entry: 288939

Thank you & Best regards,

Hello @edghba,

I’m asking internally because I’m out of ideas :smiley:



Hello Louis,

Have you had time to ask within the team?
I am thinking about what the problem can be. I suspect that the FOMO is trained by color images (RGB, 3 channels) but during the object detection the input image is read as grey-scale (1 channel) image. So the FOMO trained on colored image tries to detect from grey-scale input image and therefore it does not work or very-very rarely it can detect from grey-scale input image.
You handle inside the lib if the input image is grey-scaled (1 channel) or RGB with 3 channels.
I can only see this line converted = fmt2rgb888(fb->buf, fb->len, PIXFORMAT_JPEG, snapshot_buf); in your ei_camera_capture() where you read the captured jpg image and convert to RGB format.
I do not know what you do with the RGB input (3 channels) image in the lib later.
I suspect you convert it to grey-scale image when the FOMO is set to grey-scale but what happens if the FOMO is trained in RGB…

Best regards,

@edghba I have tested this project with EON Compiler + Int8 quantization in RGB on a couple of other pieces of hardware like the Nicla Vision and it works as expected, I’ve got an ESP-EYE on order so will take a look at the hardware side later this week.

@edghba I have now thoroughly tested this deployment with int8+EON compiler and it does work as expected on the ESP-EYE. The grayscale model you built has a much larger dataset so does perform better after being quantized compared to the RGB model, but I saw no real difference between my ESP-EYE and Nicla Vision in both Arduino Library Output and EI Firmware modes for your RGB model. I was testing by pointing the cameras at some of your test samples on my screen.

One thing to note is that with a smaller dataset using a different camera for acquisition and inferencing can give poor results because of the different exposure/iso of each sensor. A larger dataset should mitigate this somewhat.

For reference these were my board selection options for flashing the ESP-EYE in the Arduino IDE. Key bits being the PSRAM enabled

Let me know how you get on, hope this helps

Hello Jim,

Thank you for your testing and supporting. I will compare the board selection options. Thank you.
I have created lot of datasets in the Edge Impulse and I also tried settings such as grey-scale and RGB and combinations of these to solve the issue.
(You may not see my best setting only my last trial. I do not remember how I left there earlier but that does not matter now.)

  • Could you give me name of the dataset which is working at you, please?
    I would like to follow your settings exactly to be on the same page.

  • How many objects were detected correctly or detected at all?
    For example from 50 images?

  • Yes, I know it can results worse results when I train with mobile phone images but I use the ESP-EYE or the ESP32-CAM for inference and it would be the ideal to use the same camera for training and inference. This is the natural method.
    But it is also not so good that the image quality of the training set created by ESP-EYE was bad when I used. Therefore I try out more possibilities.
    Additionally when I used the option “Connect your device or development board” to take photos with EI then quality of the training images were poor (size were also low).
    See my ObjDetnRedBlackESPImages dataset, please.
    (I have now a new idea for this namely I create the images with ESP-EYE and I upload it to EI manually. This is also a plan which I would like to try out later.)

But now the most important that I can see the same results as you where it works. :slight_smile:

Best regards,

Hi Lehel,

  • I was working off “ObjDetnRedBlackCylinder1”
  • I did not do a full scale image by image test- but just tried to confirm that there was not an unexpected behaviour with the library, I got the same behaviour on the ESP-EYE and a Nicla Vision (different hardware) so I’m fairly confident the RGB mismatch is not an issue.

It is worth remembering that:

  • If the image quality of the training set created by the ESP-EYE is poor, then that’s what the model will be ingesting too, so any performance issues could be related to this. Your mobile images are collected at 500x500px, but your model is trained at 96x96px so they will be resized before training anyway. The ESP images are collected at 96x96 and trained at 96x96.
  • You train your model on individual static images, but running on device happens continuously which adds motion and artifacting into the equation, it is understandable that you may not get 50/50 frames detecting your object. You can use averaging or “debouncing” to increase the reliability of a detection.
  • quantizing the model to int8 will reduce accuracy in some cases

Hi Jim,

Yes, it is ok that the generated SW works. It was not my question. Sorry, maybe the title of the topic is a little misleading.
My goal is “simple”:

  • I have 4 different objects which are different only in colors
  • I have ESP32-CAM and ESP-EYE (it does not matter which one is used)
  • I would like to use your Edge Impulse tool to train the neural net which can be found in the tool.
  • I download the lib put into the Arduino IDE
  • I write no code/no change I only use your EI tool and your framework code.
  • I would like to see that it works it means that it can detect the 4 different objects “somehow”
  • (I know that you rescale the image into 96x96.)

The problem is that (see my entry from Oct 17 also, please):

  • For grey-scale images it works but it is not good for me because my objects differ only in colors
  • For colored (RGB) mode in Edge Impulse the FOMO MobileNet V2 can not find any objects. Not only it is wrong in the color it can’t even find actually anything. (Maximum 1-1 randomly.)

You can try with ObjDetnRedBlackCylinder also if you set to RGB where there are lot of images.
I already tried out many options.

My opinion is that the results for RGB mode so bad (actually nothing) that it can not come from the image quality of the ESP. The neural nets are much more robust if it can not see the object exactly in the same light condition in the same environment which was it during the training set than it can not detect the object. It can. I also tried the detection by creating the same place and the same lighting conditions. It did not work. (I suspect the problem is, that in RGB mode the captured image will be grey-scaled transformed which is not good because the FOMO is trained on RGB images. Only idea.)

From the moving images the EI SW capture photos and anyway during the testing phase I do not move the camera. So it can not be the problem.

  1. Could you show me a working scenario (with your objects) where your tool is working with RGB images+ESP32-CAM?
  • Which image resolution do you propose to train the net? I would follow you step by step. :slight_smile:
  • Shall I do the images with the ESP32-CAM?
    (I think it should work with Webcam, Mobile Phone also. Maybe not perfectly but I think it should work.)

For me it is enough at first if you share a sample EI project example with me which proves that RGB+ESP32 works. I mean that the object detection works.
I would like to test if it can really detect the objects e.g. ca. 8 from 10.

  1. Extra question :slight_smile:
    Do the YOLO v5 options work in EI tool?
    Maybe it would solve my problem because the YOLO is a very efficient object detection neural net.

Thank you for your support.

Best regards,