Best Approach for Custom Person Detection on ESP32-CAM — Manual Model or Edge Impulse?

Topic:
Best Approach for Custom Person Detection on ESP32-CAM — Manual Model or Edge Impulse?

Context/Use case:
Building a low-cost, compact computer-vision system to detect unauthorized individuals outside a lab using an ESP32-CAM. The goal is to keep hardware minimal while still achieving reliable person detection on-device.

Details:
I’m planning to create an AI model that runs directly on the ESP32-CAM. During research, I found two main workflows:

  1. Manual workflow
  • Collect dataset
  • Train a custom lightweight model
  • Optimize (quantization, pruning, etc.)
  • Convert to TFLite / ESP-NN format
  • Deploy manually using ESP-IDF or Arduino
    This gives full control but requires more expertise, especially in ML optimization and embedded deployment.
  1. Automated workflow using Edge Impulse
  • Upload images (dataset)
  • Use built-in training pipelines
  • Auto-optimize for microcontrollers
  • Export as an ESP32-ready library
    This drastically simplifies dataset management, training, optimization, and deployment—especially useful for beginners or early prototyping.

I want to understand which approach makes more sense for a practical custom project where I need custom detection but also need fast development without too many complications.

Questions:

  • For an ESP32-CAM, which approach is more practical: manual training or using Edge Impulse?
  • How much accuracy difference can I expect between the two methods?
  • Does Edge Impulse handle model optimization well enough for real-time inference on ESP32-CAM?