Multi-stage inference on ESP32: Wakeword (KWS) followed by Command Classification

Context/Use case: I am working on a project using an ESP32 where I need to implement a two-stage voice recognition system:

  1. Stage 1 (Always-on): A Wakeword model (Keyword Spotting) similar to “Alexa” or “Siri”.
  2. Stage 2 (Action): A Command Classification model that is triggered only after the specific wakeword is detected in Stage 1.

Details: My goal is to have the ESP32 listening for the wakeword. Once detected, the system should switch context and run the second model to interpret the specific command/action the user wants to perform.

Questions:

  1. Deployment Structure: What is the best practice to deploy two different models on the same ESP32 using Edge Impulse?
  2. SDK Management: Do I need to export and include two separate C++ libraries (SDKs), one for each project? I am concerned about code size and potential conflicts. Is there a “cleaner” way to combine two impulses in a single firmware?
  3. Memory Management: Since the ESP32 has limited RAM, do you have any suggestions on how to manage the Tensor Arena allocation when switching between these two models to avoid memory overflow?

Hello @Betao very interesting project!

To achieve this, you will need to create two separate projects in Edge Impulse Studio and combine the C++ libs with our multi-impulse deployment method:

Run multiple impulses (C++) - Edge Impulse Documentation

We do have this as an automated enterprise feature, but if you are using a free account you will need to use the manual method.

Finally, take into account that you will be running this in a ESP32 and this feature is advanced for MCU-based devices. You may run into build errors, so read carefully the docs and feel free to share your next steps here so we can support you!