Edge impulse with Esp32-Cam project

barnareds · August 11, 2023, 4:23pm

Hello, I am doing an Esp32-Cam car project and I would like to add object detection to it. The car does a live stream to the web. Is it possible to combine the two? Has it been done with edge impulse?
Thank you for any help.

shawn_edgeimpulse · August 11, 2023, 4:48pm

Hi @barnareds,

Edge Impulse is built around doing live inference locally on your device (e.g. doing inference on the ESP32). If you’d like to perform inference using an Internet endpoint (i.e. for faster inference with larger models), I recommend checking out other services, such as AWS SageMaker or Microsoft Azure.

EDIT: you can combine the two, but you will need to both 1) run inference locally on your device (e.g. by deploying a trained model with Edge Impulse) and 2) send raw data to a cloud endpoint.

barnareds · August 12, 2023, 10:51am

I will run the stream on the device’s local network, is that what you mean by run inference locally on my device? What does send raw data to a cloud endpoint mean?
I appreciate the help very much.

shawn_edgeimpulse · August 12, 2023, 4:15pm

Hi @barnareds,

Apologies if you already know these terms, but I’ll review them here just in case:

“Inference” means to apply a trained ML model–the model accepts new, unseen data and provides results (e.g. a classification, regression, or anomaly prediction) for you to use in your application.

“Locally” means on-device, without needing to send or stream the data elsewhere.

“Cloud endpoint” is a web API that you can call (usually with something like the REST or MQTT protocol) to have a server on the Internet perform some action for you.

Edge Impulse trains machine learning models to be deployed to your microcontroller or other small, low-powered device. You then perform inference on those small devices without the need for an internet connection. Note that this often means you must make compromises in the size and performance of your machine learning model. Do not expect YOLOv5 object detection to run at 30 fps on an ESP32–the ESP32 does not even have enough memory to hold/run YOLOv5. You can, however, run smaller, lightweight models, such as FOMO and small instances of MobileNetv2 SSD. But, you will compromise on accuracy.

The alternative is to stream your camera data to a more powerful computer somewhere else. This could be a PC, laptop, or Raspberry Pi on your local network. Or, it could be a powerful GPU server somewhere on the internet (in the “cloud”). When you stream your data, you must tell your data where to go. The server location, protocol, and credentials are known as an “endpoint.” The endpoint is where you send your raw data, and you will get inference results back over the network (or internet).

If you want to try performing inference locally on your ESP32 (without streaming your raw video feed) and OK with lower accuracy, then Edge Impulse is a great choice to train and deploy your model.

If you want to stream your raw video feed over a local network to perform inference on a more powerful, personal computer, then you would want to set up a local endpoint using something like Roboflow or .NET core. You can still train and deploy a model with Edge Impulse, but TensorFlow and PyTorch are also great (as are any number of other ML frameworks out there).

If you want to stream your raw video feed over the Internet to use powerful GPU clusters and servers, then you will need something like AWS SageMaker or Microsoft Azure to deploy/host your model and provide an endpoint where you stream your data to. This post by Microsoft does a great job at explaining how cloud endpoints for ML work.

MMarcial · August 13, 2023, 12:40am

@barnareds I thought the video stream was coming from the internet. So I initially thought Shawn was saying to point the ESP at your computer monitor showing the video stream and have the ESP make predictions and then trigger some event.

barnareds · August 13, 2023, 6:06pm

@shawn_edgeimpulse Thank you very much. I am in school and trying to get into all of this so your good explanation helps a lot. I will try to run inference locally on my device with edge impulse and at the same time stream the Esp32cam video. I’ll try this by combining both codes.