M5Stack Core2 Keyword Recognition - Why Does It Work Perfectly on Phone But Fails on the Board?

Hello Community!

I’m working on a keyword recognition project with an M5Stack Core2 board and I’ve hit a roadblock. I hope someone can help or provide advice.

The Project

  • Using M5Stack Core2 board for keyword recognition
  • Trained the model in Edge Impulse:
    • 1 keyword (800 samples)
    • “Noise” category (500 samples)
    • “Unknown” category (500 samples)
    • Using MFE and Transfer Learning
    • 1000ms window + 500ms expansion
    • 16kHz sampling rate

The Problem

The model works perfectly when tested with the Edge Impulse tester or phone app, BUT on the Core2 board the recognition rate is very poor:

  • It only recognizes the keyword about 1 out of 5 times
  • It only works when I pronounce the word exactly as during training
  • When playing back the trained keyword to the board, the ratio improves, but it’s still far from perfect

I’ve tried adjusting the microphone gain factor in the code, but this didn’t bring significant improvement.

My Questions

  1. What could be causing the same model to work perfectly on a phone but barely on the board?

If anyone has encountered a similar issue or has ideas for a solution, I would be very grateful for your help! I’m even willing to pay a developer who could spend a few hours solving this problem, as I urgently need to achieve stable operation.

Thank you in advance for your help!

Here is the arduino code: Flipper/src at main · nowlabstudio/Flipper · GitHub

1 Like

Hi @eduardsik !
This kind of the problems is always tricky, especially since you are using a board that we did not verify. I only had a brief look into the code, so a few things:

  1. The code is quite different from our ESP32 microphone example. Where did you take the code? If it is generated by LLM, I would not trust it to get everything right, it’s best to use working example.
  2. It seems the code DOES NOT utilize continuous inference. There are two examples, regular and continuous inference. In a regular the sampling stops when inference is run, which is not something that you want in production.

What we recommend is starting out with our microphone sketches - if these work, then move onto adapting them into your application. We cannot offer help debugging your code :slight_smile: