Data format for audio recogniton

Hello, i want more clarity about the data used in training and testing in audio recognition examples.

Currently I receive PCM audio samples, which I get after interfacing PDM microphone with nRF DK. Than I have created a python script to generate .wav files using the above PCM samples for listening audio.

So what type of data should I use in the audio examples to recognize audio ?

  1. Should I use the raw PCM data or can I directly pass the 2seconds .wav file to train and test ?


  1. Should I generate .wav files from the PCM data and use that .wav files to recognize voice by passing it to edge pulse using python script. If yes is there any script available to pass .wav file to edge impulse for testing.


  1. Should I get data from the generated .wav file (mentioned in 2nd option ) using the python scipy module.

Can anyone guide me, which data should I use to train and test.

Thanks in advance

Hello @Nikhil,

I haven’t used this microphone (don’t know which libraries are available) but to collect the training data you have several options
You could store in a buffer the 2 seconds raw you want to sample and then upload it to EI studio (either .wav format or directly the raw data).
You can also use a python script to upload the data.

On my side, I would upload .wav file but other options should work.

The only thing you cannot use at the moment is the CLI data forwarder because it only works on sensors with lower sampling frequencies.



Thanks @louis for the valuable information. Currently I am reading the .wav files in python script using scipy module and passing data to edge impulse.

I think this can suit me as well, it will save time of reading data from the .wav file in python script.

I would like to know is there any way of uploading .wav files using python script ?

Hello @Nikhil,

This is a public project I wrote for some benchmarks with TinyML Perfs:

Here is the Jupyter Notebook I used to push the data that are present in the public project above to EI studio:

I mostly used the script from @ShawnHymel from his Coursera course.
Particularly this one on week 3:

I hope this can give you an example (on how to upload .wav files to Edge Impulse using a python script)



1 Like

Thanks @louis for guidance.


Hi @Nikhil, if it helps, here is the repo where I keep the Jupyter Notebook script for curating and uploading .wav data: I usually run it in Colab.

Please note that it was specifically designed for keyword spotting, so it downloads the Google Speech Commands dataset and mixes those with whatever samples you’re using to create a dataset that is divided into keyword1, keyword2, etc. and noise and unknown categories (where unknown is “all other words that are not one of the keywords”). Hope that helps!


Hello @louis
My project in the end will be deploying a trained edge impulse model on nRF52840 DK. So i think i should be using 2s raw PCM data to train and test my model, as i won’t be able to generate .wav files on development board while inference.

Just wanted to know once more will the raw PCM data will work for audio classification like cough, laugh detection ?

Hi @Nikhil, yeah you can just stick PCM values into the values array like this:

    "protected": {
        "ver": "v1",
        "alg": "HS256",
        "iat": 1564128599
    "signature": "xxx",
    "payload": {
        "device_type": "MY_DEVICE",
        "interval_ms": 0.0625,
        "sensors": [
            { "name": "audio", "units": "wav" }
        "values": [
            -1, 121, 381, -211
1 Like

Thanks @janjongboom
Exactly what i was looking for.

1 Like


I have successfully trained my model using int16 PCM values. But during deployment, while inference i need to provide float buffer as input.

I have read " docs " it states that i can convert the int to float using numpy function.

numpy::int16_to_float(features + offset, out_ptr, length);

Inside this function, its basically converting my values to range of -1.0 to 1.0 .

So i dont think my model will do inference on the input float values as its far different in range of int values on which model is trained.

So i am just converting my float values from int values without division. Will it work ?
Below my current buffer is int16 pointer to int buffer.

features[i] = (float)*(current_buff + i); 

Or should i once again train my model by passing values in this format (int values/32768).

@Nikhil yes, you can just cast to float as well, the exact format that we expect is here under ‘Raw features’ which is just int16 range (but as a float).


int16_to_float is a helper function which you can use to save memory (when using it in a signal_t get_data function), as you don’t need to convert beforehand and thus save 2x the memory (as we page data in and then convert when needed, thus your audio buffer can be 16-bit int (2 bytes) rather than 32-bit float (4 bytes)).