8KHz audio - MFCC parameters & performance predictions

Robotastic · February 25, 2021, 10:01pm

I have been looking at comparing using 8KHz vs 16KHz sampling rate for audio. It seems to be a pretty good bang for the buck for sound classification. I have the same audio, in 2 different projects that have the same configs. The only difference is the sample rate of the audio.

The weird thing is that the On-Device Performance predictions on the MFCC are greater for the 8KHz audio than the 16KHz. For 8KHz, the latency is 424ms and for 16KHz it is 352ms.

Are there some adjustments to the default parameters of MFCC that I should make when using 8KHz?

aurel · February 26, 2021, 10:43am

Hi @Robotastic,

The subframes created by the MFCC block may be cropped depending on your FFT length and subframe length, this could affect the latency. FYI we are adding DSP documentation, you can read more about FFT length vs subframe length here: https://docs.edgeimpulse.com/docs/spectrogram
This is for the spectrogram block but it also applies to the MFE/MFCC.

Could you also share the MFCC parameters you set?

Aurelien

Robotastic · February 26, 2021, 2:06pm

That is really helpful! I didn’t realize there was a relationship between the FFT size and the samples generated by the frame length and sampling rate. More DSP documentation would be great! I don’t know much about this space.

Here are the parameters I am using. The only thing I really messed with was the Window Size. I should probably adjust my FFT size since the # number of samples for each frame would be different.

aurel · February 26, 2021, 3:18pm

We will publish the rest of the documentation by next week

You should try with higher FFT length (512) for your 16 kHz samples even if it increases latency.

With your existing values, the 16 kHz subframes contains 320 samples and so they are cropped to 256 samples, whereas with 8 kHz they are padded with zeros. This cropping can definitely affect the prediction results in this case.

Aurelien