I have tried to deploy a project for MNIST audio recognition (numbers) which seems to work correctly on the web page to a board, but i couldn´t get any correct classification.
I read in the forum thta it could be because of live classification works with float32 so deploying on int8 could affect the result, so I deployed with both data types and nothing changed. I also changed the processing block selected by EON Tuner from MFE to MFCC but I got the same problem, the result is always 0,1 or 5 depending the signal I paste from raw data on Edge Impulse .
I dont know what can be wrong.
Which board are you deploying to and are you using a binary firmware or the C++ library export?
I notice your project uses 48 kHz frequency but some boards only support lower frequency (16 kHz is the default).
Hi @aurel ,
I’m using a STM32F429I-disc1 (180MHz) but I’m not taking the samples with the board, just pasting the raw data from de Live Classification section. Could it affect on the performance anyway?
Also I’m not sure if it is significant that the .wav files used are from a dataset which has been recorded at 48kHz, in order to set the frequency.
I changed the frequency of the clock config on STM32CubeIDE to be close to 80MHz as the CPU selected in Edge Impulse and changed the frequency of the project from 48kHz to 16kHz but still not working.
Would you try to collect the dataset from the board using the edge-impulse-daemon?
As Aurélien mentioned, if the dataset is sampled at a higher frequency than the one the board supports, then it is expected that the inference will not work as the DSP is expecting a higher frequency data.
As @OmarShrit mentioned, you will need to resample your training and test sets to match the expected sampling rate of your intended target. For example, if you know you need to work at 16 kHz on the STM32, then you need to downsample your training/test sets from 48 kHz to 16 kHz. There are a number of tools that can help you do that. Audacity is one (if you want a GUI). I’m a fan of librosa for doing it programmatically in Python.
I have a demo project here (https://github.com/ShawnHymel/ei-keyword-spotting) with a Colab script (ei-audio-dataset-curation.ipynb) and module (dataset-curation.py) that does dataset resampling and mixing for augmentation. Feel free to look in there to see how I did resampling.
That repo also contains some examples on how to do keyword spotting with a trained Edge Impulse model/library on an STM32.
Hope that helps!
Thanks for aswering to both of you @shawn_edgeimpulse @OmarShrit but does really matter the sampling frequency of the audio if I’m making test with data from Edge Impulse and not recording live audio? Because I see that logical if I am recording the keyword on the board but for this particular case in which the “audio” is already pasted on a buffer I’m not really using any recording frequency.
By the way another thing I noticed is that at 48kHz with 1 second window I have values but when copying the raw data this value varies due to the length of the audio file, is this maybe making the model to perform wrongly as EON Tuner automatically sets the window size to 1 second? Even when i activated the zero pad data option.
The sampling frequency should be the same across training, testing, and deployment, as it affects how the features (MFCCs) are computed. You can use 48 kHz if you are training and testing in Edge Impulse. If you deploy that model to a board, you need to make sure to also capture audio at 48 kHz with the attached microphone.
I apologize, but I do not follow the second question. Do you have a screenshot or video showing where EON Tuner is not performing correctly?
Sorry I didn’t explain well.
It’s just that at 48kHz for 1 second window size I should have 48000 features but when copying the raw data it copies like 26000 or 37000 features but i figured it is just because of the length of the wav file. The doubt was if it should cause a poor result on classification even zero pad data fills that gap with zeros.
I checked your original data, and it looks like most samples are not fully 1 second long. As a result, the ends will be padded with 0s, as you have noted. This should not affect training much, but I can’t promise it will work well in deployment (i.e. as you may end up capturing sound in the inference data where the training set had 0s). I it likely work OK, but it’s not ideal.
There are a few ways to handle this issue if you really want to be thorough in your training/deployment. In deployment, you can always capture just 0.6-0.7s and pad with 0s to make it look exactly like the training set. You could try writing a script that pads all your samples with 0s and then mixes in some background noise to everything to make it seem a little more realistic.
I tried with a new project with longer sounds and it worked well so I guess it was the 0 padding feature which was causing troubles.
Glad to hear you got it working!