Hardware recommendations for Medium Resolution

Hardware choices for medium data input/ML
Hi, Newbie here. Deploying for Hobbyist / DIY tinker
Training and Using Practical Voice and Video for ID entry

First I was lead to Edge Impulse from Drone Workshops video on using the ESP32 for object detection with FOMO. I’ve watched some videos on FOMO and realised that this model may not be the best for personalising an ID (correct me here)

The main limitation is the Hardware, and while I love the idea of my little esp32-c3-mini’s are mini-computers, it’s not enough for the data resolution I think I might require, in-fact no esp32 probably is.
If there is a way to join multiple micro-controllers together to get better performance, this might be an option, but must likely complicated.

I had looked at some of the ARM processors (individually and packaged) and those that are now dedicated to ML (Cortex-55 series). (Hard to find an exact cost)

I also looked at the Pi4 as this is the flag ship of Arduino. ($170 AUD)

Hey, :exploding_head: I’m paying for Mini-computer prices, why not go to either an mATX or Laptop or Cube PC? (iTX is a little overpriced for a hobbyist).

Surely there is enough processing power to run the resolution needed for Both Visual and Audio with some type of Mini system.

The application would be this: I walk to the Camera, and it should recognise (me), release a latch ( :star_struck:Star Trek doors here we come) OR I could say a keywords and have the same result, just in-case I’m busy looking elsewhere), something like Revelo :magic_wand:

Am I over thinking this? Have I mentally spend money on stuff I don’t need? YES :eyes:

Context/Use case: Running necessary hardware to do ID (Face and Voice) for a Garage system security

Thanks for the Replies.

I’m not sure how do to paragraphs in my above post :man_shrugging:

Sadly the flagship of 13 years ago the Snapdrag (used in Smartphones) is now the microcontroller used in Mega Quest Pro Handsets :thinking: :face_with_peeking_eye: Food for thought

Also for reference, Price in AUD mATX board for $50 USD - Alternative?

The ESP will work for face recognition and key phrase recognition in a garage security scenario. Please build and deploy a model and you will be pleasantly surprised at how well these low resolution cameras and low cost MCUs work in the aforementioned scenario.

1 Like

Thanks for the reply. In referencing the following video
tinyML Talks: Constrained Object Detection on Microcontrollers with FOMO
Yes, the ESP32 can recognize ‘Face’. The input was 96x96 grid.
As you can see in the video it Recognised ‘Face’ as in Nose and Eyes :thinking:
My thought here was to increase the resolution, which requires more hardware.

Secondary, I’m not sure I can run 2 models on 1 Microcontroller, as I said Video and Audio. :face_with_peeking_eye:

Thanks for the comment :cowboy_hat_face:

Hi @Gambit,

If you want to train and deploy the model yourself, I highly recommend checking out the ESP32-S3, which has a lot more power and AI-enabled hardware acceleration versus the ESP32-C3.

For your application, I do not think that FOMO is a great fit. FOMO is good for counting objects rather than uniquely identifying one vs. many (e.g. identifying one face out of many). That being said, I highly recommend checking out this board from Useful Sensors: Person Sensor by Useful Sensors - SEN-21231 - SparkFun Electronics. It sounds like it will do exactly what you need without needing to do all the model training yourself. You just need to plug it into e.g. an Arduino via I2C, enroll your face, and voila–you can start uniquely identifying your face to control stuff!

2 Likes

Amazing stuff Shawn

I commend you on your success and I am jealous of your own neutral network. I feel like the Wizard of Oz scarecrow.

Asking about Hardware, is the idea of running mATX have any merit when it comes to deploying such models, say for a house system? And would you run Linux on that or FreeRTOS (I don’t know if it could run on a computer) or am I barking up the wrong tree?

Also, I must bestow upon you the Honor of being the Rick Hartley of Electronics and ML :saluting_face:

Thanks for all the replies.

Also a correction is needed:
My heading for this post, is it correct in regards to Medium resolution -
I don’t have the right terminology, what would the correct Heading.

Thanks, @Gambit :slight_smile:

Regarding the heading, I think “medium resolution” is fine. To me, low is anything up to VGA and I guess “medium” would be up to something like 1080p. I don’t think there’s any objective rule as to what makes “low” vs. “high”-- it’s all subjective.

The mATX would be fine and is actually probably easier to deploy a model to, especially if you’re using something like Python. The biggest issue you’ll run into with mATX is the lack of GPIO, which will make it difficult to control external hardware (such as your latch). An mATX mini-computer will also suck a lot more power than a single-board computer (such as a Raspberry Pi), which will suck more power than a microcontroller (like an ESP32).

1 Like

Do you know where I can find AI-enabled hardware acceleration

Are you referring to FPGA’s or something else?

Do you know of any other Micro’s that would do as I wanted. I will give the ESP32-S3 a consideration considering its cost is very good compared to others, and FreeRtos with PlatformIO is a good combination.

I would like to train the model myself and I know there are generic ‘Face’ Models, as listed on Edge Impulse demos.

I’ve noticed that the Raspberry Pi offers a good package at a reasonable cost, with the ability to offer a higher resolution camera, would this be the way to go?

What I’m asking is about the Combo of Software and Hardware together, what is your suggestion for Hardware/Software combo for a deployable training model for ID purposes.

I’ve found a few dedicated modules.

I wanted to go with a Low Light / InfraRed camera, so I can train for Night time situations, since training would be in Grey scale, I’m sure this is OK???

OpenMV Camera Module - is this compatible with Edge Impulse?

[Raspberry Pi NoIR Camera Board v2 - 8 Megapixels : ID 3100 : $29.95 : Adafruit Industries, Unique & fun DIY electronics and kits](https://8MP Pi Camera Low Light)

[Raspberry Pi Camera Module 3 Wide NoIR - 12MP 120 Degree [Wide Angle Infrared Lens] : ID 5660 : $35.00 : Adafruit Industries, Unique & fun DIY electronics and kits](https://12MP Pi Camera for a few dollars more)

What resolution would you be pretraining the model for? 200x200 pixels ???

Thank you

Hi @Gambit,

AI-enabled hardware acceleration can be anything from FPGAs (which are not supported by Edge Impulse at this time) to Arm U55 coprocessors (or Google TPU) to model architecture-specific hardware (such as the Syntiant NDP).

Choosing the right hardware is always a spectrum with various tradeoffs. Which of the following are the most important to you?

  • Inference speed
  • Energy efficiency
  • Ease of use
  • Physical size

The Raspberry Pi (and most Linux-based solutions) will be the easiest to use and offer great inference speeds, but they will use a lot of energy and take up the most physical space. A powerful microcontroller, like the Arm Cortex M7, will be a compromise, and smaller microcontrollers, such as the Cortex M4, will be the most energy efficient but will have slow inference times (assuming the model even fits on the device!).

The OpenMV Cam is indeed officially supported on Edge Impulse. Please see our documentation here. It is a fantastic starting kit, as MicroPython is an easy subset of Python.

If low light is important to you, then the Pi NoIR camera is probably the best place to start. I’m not familiar with other near IR cameras that work easily with microcontrollers.

If you truly want the fastest inference with the lowest possible energy usage, then I recommend checking out the Alif line of chips: Alif Ensemble E7 - Edge Impulse Documentation. They use an Arm U55 accelerator, so even complex models like object detection is fast: not as fast as a Pi, but incredibly fast for a microcontroller. However, the development environment is difficult to set up and learn, so expect to spend weeks/months just learning how to work with the platform.

The resolution, as with everything, is a trade-off of pros and cons. The higher the resolution, the better detail you can make out, which means you can identify faces farther away and with more accuracy. However, you will need a more powerful processor and can expect inference to take longer. My recommendation is to capture your training set with the highest possible resolution you think you’ll need (say, 1920x1080). From there, it’s easy to crop and downsample to smaller resolutions (480x480, 200x200, 64x64, etc.) to see if your model will retain its accuracy (i.e. on the test set).

Hope that helps!

3 Likes