How to define signal processing and neural networks for audio recognition?

Hi @rsiquijor,

That’s sound like a great project. Thanks for sharing your experience !
In my case, I performed some sound recordings to obtain in fine a dataset of 2000 samples for 5 classes, thus 400 samples per classe).
Then I split this data set into :

  • 70% for training
  • 30 % for testing

I tried a lot of NN architecture(s) including the advices above (from Jan and Dan) but I still have a significant difference between training accuracy (around 80%) and the test accuracy (around 60%).
I’m afraid that the only one solution, in my case, is to significantly increase the size of the training dataset…thus perform some more additional sound recordings…
And you, in your case, what was the size of your training set ? what test accuracy do you reach ?

Thanks,

Regards,

Lionel

Hi,

I see that the audio samples can be exported in .cbor files.
But for my use case, I would like to export the audio samples into .wav format.
Is there an option to obtain the audio samples in .wav format from Edge Impulse ?

Regards,

Lionel

@krukiou, hey, not for the whole dataset in one go at the moment, but you can do it via the API (getsampleasaudio):

https://studio.edgeimpulse.com/v1/api/YOUR_PROJECT_ID/raw-data/YOUR_SAMPLE_ID/wav?axisIx=0

You can get all the sample IDs quickly through the listsamples API call. Hope this helps, if you run into anything I’m happy to hack something up to download everything quickly.

Also, maybe useful for someone in the future. This is how we can go from CBOR to WAV (in Typescript):

    static buildWavFileBuffer(intervalMs: number, data: number[]) {
        // let's build a WAV file!
        let wavFreq = 1 / intervalMs * 1000;
        let fileSize = 44 + (data.length * 2);
        let dataSize = (data.length * 2);
        let srBpsC8 = (wavFreq * 16 * 1) / 8;

        let headerArr = new Uint8Array(44);
        let h = [
            0x52, 0x49, 0x46, 0x46, // RIFF
            // tslint:disable-next-line: no-bitwise
            fileSize & 0xff, (fileSize >> 8) & 0xff, (fileSize >> 16) & 0xff, (fileSize >> 24) & 0xff,
            0x57, 0x41, 0x56, 0x45, // WAVE
            0x66, 0x6d, 0x74, 0x20, // fmt
            0x10, 0x00, 0x00, 0x00, // length of format data
            0x01, 0x00, // type of format (1=PCM)
            0x01, 0x00, // number of channels
            // tslint:disable-next-line: no-bitwise
            wavFreq & 0xff, (wavFreq >> 8) & 0xff, (wavFreq >> 16) & 0xff, (wavFreq >> 24) & 0xff,
            // tslint:disable-next-line: no-bitwise
            srBpsC8 & 0xff, (srBpsC8 >> 8) & 0xff, (srBpsC8 >> 16) & 0xff, (srBpsC8 >> 24) & 0xff,
            0x02, 0x00, 0x10, 0x00,
            0x64, 0x61, 0x74, 0x61, // data
            // tslint:disable-next-line: no-bitwise
            dataSize & 0xff, (dataSize >> 8) & 0xff, (dataSize >> 16) & 0xff, (dataSize >> 24) & 0xff,
        ];
        for (let hx = 0; hx < 44; hx++) {
            headerArr[hx] = h[hx];
        }

        let bodyArr = new Int16Array(data.length);
        let bx = 0;
        for (let value of data) {
            bodyArr[bx] = value;
            bx++;
        }

        let tmp = new Uint8Array(headerArr.byteLength + bodyArr.byteLength);
        tmp.set(headerArr, 0);
        tmp.set(new Uint8Array(bodyArr.buffer), headerArr.byteLength);

        return tmp;
    }

Save the tmp buffer somewhere and you’ll have a WAV file.

Hi Jan,

Thanks for your quick reply.
I’m trying to use the listsamples API call in Python but I’m encountered an error :

Is it linked to jwt ? If yes, where can i find this value ? if not what’s wrong in my Python code ?

Thanks you,

Regards,

Lionel

Hi @krukio,

Change querystring to {"category": "training", "offset": 0, "limit": 10000}. This is actually a bug in the API, offset / limit should not be required. Will be pushing a fix.

Jan,

Thanks you very much : I’m now able to download the .wav files.

Have a nice day,

Regards,

Lionel

No problem! API fix has been deployed to production.

OK, Jan. Thanks you !

Regards,

Lionel

@krukiou what is the window size that you set in your impulse model? Did you use MFCC feature sets?

Hi @rsiquijor,

I set window size = 2 seconds (the length of my sound samples)
and yes I’m using MFCC feature sets.

Regards,

Lionel

@krukiou,

And your training and test sets are also 2 seconds? What worked for me is to set the data acquisition sets to 10 seconds but I set the spectral window to 1.5 to 2 seconds for the ANN to determine a pattern. If your datasets are 2 seconds and your spectral window is also 2 seconds, then it will be a hit or miss.

Yes, I have 2000 samples of 2 sec. each.
I have splitted this dataset into :

  • 70 % for the training set (1400 samples)
  • 30 % for the test set (600 samples)

Regards,

Lionel

@rsiquijor,

Regarding the “spectral windows” I don’t see a parameter called " spectral window" …
What parameter are you referring to ?

Regards,

Lionel

I am a bit wondering if 2 sec is not too long for your use case.
Are the sounds you want to classify really taking at least 2 sec ?
If not then your recording samples start or end with unrelated sound, I think this might explain poor training results.

Hi @janvda ,

Thanks you for your message.
The sounds i want to classify have different lengths but you’re right they are taking less than 2 sec.
In this case, what I have to do to increase the performance of my model ?

Thanks you,

Regards,

Lionel

I would suggest

  1. to crop the recorded sounds so that they only contain the sound to be classified (no silence at the start or end of recording). I think it is important that the complete audio fragment used for training contains the actual sound to classify.
  2. As the actual sounds take less than 2 sec, I would set the window size to the duration of the shortest sound instance you would like to be able to classify. In your case it will most likely be much less than 1 sec.
  3. In order to assure that neural network has sufficient samples for training you can reduce the “window step”.
1 Like

Note that you can now crop samples straight from the studio, just click on the three dots next to a sample. and select Crop sample.