After successfully implemented the audio tutorial, it’s time to go one step further.
I’m looking to import an existing (audio) dataset and I have a few questions regarding the format:
Looking at the cbor files examples, samples seem to be in a 16-bit format. Should all the samples have the same bit-depth and if so can we perform a normalization [-1; 1] before the data goes into the MFCC block?
Are there any requirements regarding frequency sampling value apart from having the same for all samples?