Speech Recognition on ESP32 with I2S microphone (INMP441)

dasig_jp2017_gmail · April 24, 2025, 8:45am

check the pin configuration on your end. you need to edit that part.

also esp32 have memory alloc limitation take that in mind when training models in edge impulse.

If I remember it correctly its around 52920 raw samples. dont exceed from that point

dasig_jp2017_gmail · April 24, 2025, 8:48am

the result of the accuracy of model in Edge Impulse and in the real word testing is really different.

You need to test and test and test to get the result you want here. no need to modify much on the code. you need to train model non-stop to get result you want on the real world applications

deegalasriya · April 26, 2025, 3:25pm

With the 5kHz, I was able to get 95% accuracy in the model testing, But the real time classification was way out. I think my code was not handling the buffer correctly.

dasig_jp2017_gmail · April 28, 2025, 8:28am

use two processing blocks

MFE and MFCC at same time . this increases real world accuracy for me

se732525 · April 28, 2025, 12:57pm

I did use 2 processing blocks. Again I’ll repeat what I’ve said earlier…I have a great impulse model. When I test it with my phone’s speaker it works flawlessly. It works like crap on my esp32 with its microphone. I seen to be having a hard time making that point.

Eoin · May 29, 2025, 9:33am

Hi @se732525

Can you share the DSP configuration from the Signal Window page and your create impulse page?

Downsample Training Data

Train your model on 5kHz or 8kHz downsampled versions of your data (match what the ESP32 can actually capture.)
This helps the model learn in the same spectral context the device will use.

e.g.

Did you record a sample with the ESP 32 mic to confirm the WAV is collected as expected?? Try capturing a sample raw to test with.

Improve DSP Configuration

Use MFE or MFCC, not both because the memory is not sufficient.
MFCC config to try:
- Frame length: 0.02
- Frame stride: 0.01
- FFT length: 512
- Number of coefficients: 13
- Filter count: 32
- Low freq: 300, High freq: 3800
- Pre-emphasis: 0.95

Best

Eoin