Audio samples format

Hello everyone,

After successfully implemented the audio tutorial, it’s time to go one step further.

I’m looking to import an existing (audio) dataset and I have a few questions regarding the format:

  • Looking at the cbor files examples, samples seem to be in a 16-bit format. Should all the samples have the same bit-depth and if so can we perform a normalization [-1; 1] before the data goes into the MFCC block?

  • Are there any requirements regarding frequency sampling value apart from having the same for all samples?


Hi @aureleq, good question - it might be that the filter bank multiplications in the MFCC block will all result in near zeros when scaling to -1 and +1. If so we can add a scaling parameter to the block (like we did for spectral analysis and raw features) so you can upscale before doing the calculations. Other than that no, it should work on any frequency.

Thanks for the quick feedback Jan, I’ll keep a 16-bit depth for the samples then.