Troubleshooting Deployment Accuracy for Edge Impulse Model on ESP32-CAM

Question/Issue:
I trained an image classification model using Edge Impulse, achieving excellent results during training and testing. However, when deployed on the ESP32-CAM, the model’s real-world accuracy is poor. The model consistently misclassifies images, frequently categorizing inputs as “rotten bananas,” regardless of class. I tried showing it fresh bananas, other physical items, and pictures, but this issue persisted.

Project ID:
562986

Context/Use case:

  • Dataset: Fresh and rotten fruits dataset from Kaggle (dataset link).
  • Model architecture: MobileNetV2 with 96x96 RGB input images, using transfer learning.
  • Neural Network settings:
  • Input Layer: 27,648 features (96x96x3).
  • Transfer learning with 8 final neurons and a dropout rate of 0.1.
  • Validation accuracy: ~86.93%.
  • Optimizer: Auto-learned with appropriate batch sizes and augmentation.
  • Deployment:
  • ESP32-CAM (AI Thinker model) with Edge Impulse’s Arduino-compatible library.
  • Image preprocessing includes resizing to 96x96 and converting to RGB.

Steps Taken:

  1. Collected and split data (training/testing).
  2. Preprocessed images to 96x96 dimensions and converted them to RGB format in Edge Impulse.
  3. Trained the model with MobileNetV2 transfer learning in Edge Impulse.
  4. Deployed the model using Edge Impulse’s Arduino library.
  5. Tested the deployed model using real-world images on ESP32-CAM.

Expected Outcome:
The deployed model should classify images accurately, with performance aligning with the 86.93% accuracy observed during testing.

Actual Outcome:
The deployed model frequently misclassifies images, often labeling them “rotten bananas.” There is a significant disparity between Edge Impulse testing results and real-world deployment performance. Essentially, it is a bias or high false positive rate. There is a bias towards the class that performed the best in my confusion matrix.

Reproducibility:

  • [X] Always
  • [ ] Sometimes
  • [ ] Rarely

Environment:

  • Platform: ESP32-CAM (AI Thinker)
  • Build Environment Details: Arduino IDE 2.3.3 with ESP32 Core for Arduino (latest version)
  • OS Version: macOS (latest)
  • Edge Impulse Version: [Verify using Edge Impulse versioning instructions]
  • Edge Impulse CLI Version: v1.30.0
  • Project Version: 1.0.0
  • Custom Blocks / Impulse Configuration: MobileNetV2 with transfer learning. RGB 96x96 input.

Logs/Attachments:

Predictions (DSP: 8 ms., Classification: 620 ms., Anomaly: 0 ms.):
Predictions:

freshapple: 0.00000

freshbanana: 0.00000

freshoranges: 0.00000

rottenapple: 0.00000

rottenbanana: 0.99609

rottenoranges: 0.00000

Predictions (DSP: 8 ms., Classification: 620 ms., Anomaly: 0 ms.):
Predictions:

freshapple: 0.00000

freshbanana: 0.00000

freshoranges: 0.00000

rottenapple: 0.00000

rottenbanana: 0.99609

rottenoranges: 0.00000

Additional Information:
The model’s exported library code was directly used in the ESP32-CAM sketch. I suspect preprocessing or model quantization might be the issue, but I need guidance to debug further.

Request for Assistance:

  1. Suggestions for debugging the model in real-world deployment on ESP32-CAM.
  2. Best practices to improve alignment between Edge Impulse testing results and deployed model performance.
  3. Recommendations for testing camera input preprocessing on ESP32-CAM.

Thank you for your help!

Edit: Originally, when compiling, an error was thrown from two files, conv.cpp, and depthwise_conv.cpp, with errors in a structure declaration. Here is one of the errors: conv.cpp:1795:80: error: either all initializer clauses should be designated or none of them should be
1795 | data_dims_t filter_dims = {.width = filter_width, .height = filter_height, 0, 0};

I attempted to fix it myself, but let me know if there is a more proper way of patching it based on the data_dims_t structure.

This doesn’t answer your question, but it looks like the Kaggle dataset doesn’t have any miscellaneous/background class with random pictures of anything else than those fruits. This means that the model will classify everything it sees into one of your fruit classes.
As I don’t have any ESP32-device, I can’t help on the other parts, guess someone else will chime in soonish.

I appreciate the feedback; I didn’t consider this, as I was so concerned with object detection that I didn’t include noise or other data to handle different scenarios. I am also looking into data that the ESP32-CAM has taken, as that would be better than these high-quality PNGs from the dataset.