Seeking Technical Guidance for Edge-Based Speaker-Dependent Voice Recognition Project

Hi Edge Impulse Community,

I’m excited to join you all as I embark on a new project leveraging the Edge Impulse services. I am currently evaluating the technical feasibility of a project and would greatly appreciate any insights or guidance you can offer.

Project Overview: Our goal is to implement a low power, low memory, completely on-edge, speaker-dependent voice recognition system on a custom PCB featuring an ESP32C3 MCU and an I2S microphone. One of the key features of this project is the speaker dependency: once a user activates the hardware, they are prompted to set a personalized wake-word. The system must then be trained to recognize and respond exclusively to this wake-word when spoken by the specific user.

Technical Questions:

  1. Model Training and Deployment: Since on-device training seems infeasible (please correct me if I’m wrong), we are considering using the Edge Impulse platform for training the model off-device. How can we best interface our ESP32-C3 with your platform for this purpose? We plan to use the API to upload voice samples and download the trained model. Are there any potential technical limitations or specific considerations we should prepare for? Do your API endpoints support these functionalities?
  2. API Interaction and Authentication: How does the API manage authentication, especially considering that model training is a one-time requirement per user? What capabilities does your API offer for uploading audio samples and retrieving models?
  3. Development Framework Choices: Given the specifics of our use case, would you recommend using the ESP-IDF or would the Arduino framework be adequate?
  4. Cost Structure: As our interaction with the platform will primarily be through the API for sending voice samples and retrieving models, could you provide an estimate of the costs for unlimited API usage? Is the pricing model fixed, or is there flexibility based on per-user requests?

Any support or insights you can provide would be immensely helpful as we assess the technical requirements and prepare to dive into development.

Thank you so much!

Hi @bryanlopeziot

ESP32-C3 is not recommended for KWS, or anything much beyond audio event detection and anomaly. Its the 8266 replacement as far as I can tell.

Go for something like this:

Best

Eoin