I’m using FOMO MobileNetV2 0.1 for object detection. The model works in
Edge Impulse testing (shows centroids), but the deployed Arduino library
returns 0 bounding boxes despite successful 126ms inference.
Project: Yeriel-Project2
Board: ESP32-S3 / ESP32-CAM
Model: FOMO MobileNetV2 0.1, 96x96 grayscale
Issue: result.bounding_boxes_count = 0 (always)
Inference runs successfully (non-zero timing)
Model Testing shows centroids (circles) but not bounding boxes
How do I enable bounding box post-processing in Arduino library export?
Hi, @Yeriel !
It is expected for FOMO to output centroids and not bounding boxes.
Regarding 0 result count for on-device inference: which sketch are you running? Are you modifying it?
Hi Team,
Thanks for clarifying that FOMO outputs centroids at the neural network level. However, I understand the Edge Impulse SDK should convert these centroids to bounding boxes and populate result.bounding_boxes[].
Currently, result.bounding_boxes_count returns 0, which suggests the
post-processing step that converts centroids to bounding boxes is not
executing in the Arduino library.
Questions:
Should result.bounding_boxes[] be populated with boxes generated from
the FOMO centroids?
Is there a specific function or setting to enable this conversion in
the Arduino library?
Or does the Arduino library only provide raw centroid data that I need
to manually post-process?
If manual post-processing is required, could you provide the algorithm or
example code for converting FOMO’s centroid grid to bounding boxes?
To clarify the issue:
In Edge Impulse Model Testing, I see centroids but the deployed Arduino library returns result.bounding_boxes_count = 0 which suggests centroids are being output at all, not even as points.
Sketch details:
Using the default Edge Impulse generated example sketch (esp32/esp32_camera.ino)
Running on ESP32-S3-EYE MB V2.2
Board: ESP32S3 Dev Module, PSRAM: OPI PSRAM enabled
Library: Deployed as Arduino library with TensorFlow Lite, Quantized (int8)
The camera is verified working (captures images with good pixel variation).
Inference runs successfully (timing: DSP=6ms, Classification=143ms).
However:
result.bounding_boxes_count always returns 0
result.classification array also shows all 0.000 values
It appears the model output is not being populated in the Arduino library,
even though inference completes without errors.
Is there a specific configuration or post-processing step needed to access
the FOMO centroid outputs in the Arduino library?
No, it’s not likely to be post-processing. The most likely culprit is wrong camera initialization - the Arduino sketch is tested with ESP-EYE (the original ESP-EYE, not S3), and camera init parameters might differ from board to board. This is the first thing to check.
Another thing that might differ is Arduino ESP32 Core version. The version the sketch was tested with is specified in the sketch. If you use version that differs too much, it’ll likely to break as well.
Have you also tried inference on static buffer as sanity check?
Thank you for the guidance. Here’s an update on our progress:
Camera initialisation: We are using the ESP32-S3-EYE (MB V2.2) with an OV2640 sensor. We have verified the camera is initialising correctly and pins adjusted to that of the ESP32 S3 EYE — frames come through at 320×240 in PIXFORMAT_GRAYSCALE format. We are using Arduino ESP32 Core 2.0.4 with Board: ESP32S3 Dev Module, PSRAM: OPI PSRAM enabled.
Static buffer sanity check: Yes, we ran inference on a static 96×96 buffer filled with mid-grey (128). Result was OK — DSP=6ms, NN=145ms, no crash. This confirms the model and SDK are loading correctly.
Current status: We resolved an earlier crash (caused by crop_and_interpolate_image overrunning a PSRAM buffer) by replacing it with a manual nearest-neighbour downscale from 320×240 → 96×96 directly into internal RAM. run_classifier() now completes successfully every cycle without crashing.
Remaining issue: bounding_boxes_count always returns 0 and classification scores remain 0.00000 for both classes (bottle, cube), even when the object is clearly in frame with good lighting. The model has 93.10% accuracy in Edge Impulse’s own Model Testing tab.
Our current image pipeline is:
Capture 320×240 grayscale from OV2640 into camera framebuffer (PSRAM)
Manual nearest-neighbour downscale → 96×96 into internal RAM buffer
Normalise to [0, 1] float in get_data callback
run_classifier() called with signal.total_length = 9216 (96×96×1)
Questions:
Is there anything specific about how FOMO populates result.bounding_boxes[] on Arduino that we might be missing?
Could the manual downscale (vs crop_and_interpolate_image) affect detection? Should we be centre-cropping to square before downscaling rather than stretching 320×240 → 96×96?
Is there a recommended way to verify the image data reaching the model is valid beyond pixel range checks?
static buffer meant to be run with the raw features data from Studio - to check the on-device inference vs Studio with static data.
I think I have an idea about the issue you are having though: it seems like you obtain grayscale image and then try to feed it into the pipeline. The Arduino sketch example is based on assumption you are getting jpeg image from camera and then convert it to RG888 - then resize it and feed that RGB888 image into run_classifier, where internally is getting converted to Grayscale. It’s also likely this is exactly why you had issues with crop_and_interpolate_rgb888 function - as in the name, it assumes rgb888 input.
Can you please try the default settings and original camera sketch
.pixel_format = PIXFORMAT_JPEG, //YUV422,GRAYSCALE,RGB565,JPEG
.frame_size = FRAMESIZE_QVGA, //QQVGA-UXGA Do not use sizes above QVGA when not JPEG
.jpeg_quality = 12, //0-63 lower number means higher quality
.fb_count = 1, //if more than one, i2s runs in continuous mode. Use only with JPEG
.fb_location = CAMERA_FB_IN_PSRAM,
.grab_mode = CAMERA_GRAB_WHEN_EMPTY,
};