Hello, i want more clarity about the data used in training and testing in audio recognition examples.
Currently I receive PCM audio samples, which I get after interfacing PDM microphone with nRF DK. Than I have created a python script to generate .wav files using the above PCM samples for listening audio.
So what type of data should I use in the audio examples to recognize audio ?
Should I use the raw PCM data or can I directly pass the 2seconds .wav file to train and test ?
Or
Should I generate .wav files from the PCM data and use that .wav files to recognize voice by passing it to edge pulse using python script. If yes is there any script available to pass .wav file to edge impulse for testing.
OR
Should I get data from the generated .wav file (mentioned in 2nd option ) using the python scipy module.
Can anyone guide me, which data should I use to train and test.
I haven’t used this microphone (don’t know which libraries are available) but to collect the training data you have several options
You could store in a buffer the 2 seconds raw you want to sample and then upload it to EI studio (either .wav format or directly the raw data).
You can also use a python script to upload the data.
On my side, I would upload .wav file but other options should work.
The only thing you cannot use at the moment is the CLI data forwarder because it only works on sensors with lower sampling frequencies.
Thanks @louis for the valuable information. Currently I am reading the .wav files in python script using scipy module and passing data to edge impulse.
I think this can suit me as well, it will save time of reading data from the .wav file in python script.
I would like to know is there any way of uploading .wav files using python script ?
Please note that it was specifically designed for keyword spotting, so it downloads the Google Speech Commands dataset and mixes those with whatever samples you’re using to create a dataset that is divided into keyword1, keyword2, etc. and noise and unknown categories (where unknown is “all other words that are not one of the keywords”). Hope that helps!
Hello @louis
My project in the end will be deploying a trained edge impulse model on nRF52840 DK. So i think i should be using 2s raw PCM data to train and test my model, as i won’t be able to generate .wav files on development board while inference.
Just wanted to know once more will the raw PCM data will work for audio classification like cough, laugh detection ?
@Nikhil yes, you can just cast to float as well, the exact format that we expect is here under ‘Raw features’ which is just int16 range (but as a float).
int16_to_float is a helper function which you can use to save memory (when using it in a signal_t get_data function), as you don’t need to convert beforehand and thus save 2x the memory (as we page data in and then convert when needed, thus your audio buffer can be 16-bit int (2 bytes) rather than 32-bit float (4 bytes)).