Audio Classification and Sound Normalization

I am working on training an audio classification and for that, I am using either Spectrograms or MFE as an Impulse. However, I’d like to ask if Edge Impulse on the backend normalizes the sound samples while training or while generating features on Impulse. For example, we might have a sound sample but the amplitude/frequency of the sample may be very low and either Edge Impulse automatically adjusts the Amplitude/Frequency or normalizes the sound to match with the rest of the sound samples including the following:

  • How to deal with sporadic silence inferencing outputs
  • If the data is normalised for training and how? (per file or over all files?)
  • When inferencing, does it also normalise?

Regards,

Hello @safi.dhillon,

You can normalize your values either in the pre-processing block or in the learning block.

  • For the DSP blocks:
    In Spectrograms and MFE, we have a way to do normalization in the default pre-processing blocks using the noise floor but you can customize this using the custom DSP blocks.
    In the MFCC block, you can you the Normalization Window size parameter: The size of sliding window for local cepstral mean normalization. Windows size must be odd.
    However, I an not sure this is exactly what you’re looking for. In that case you can use the custom DSP blocks to modify the behaviour of the blocks: See this thread.
    In all cases in the DSP blocks, the normalization will be applied per window size.

  • In the learning block, you can use an extra layer to do your normalization. This will be applied on the pre-processed features (the features obtained in the output of the DSP block) either on the feature level (https://keras.io/api/layers/preprocessing_layers/numerical/normalization/) or on the batch level (https://keras.io/api/layers/normalization_layers/)

Note, if you use the custom DSP blocks option, you’ll need to implement the modification of the cpp code as well. We have examples on how we implemented that for raw and flatten DSP blocks. This can easily be done by adding your custom function in your main.cpp.

In both cases, during inference, the normalization will be applied (either with the NN weights or with through your DSP config).

Best,

Louis

Hi @louis ,

Thank you for the response. Could you please explain/highlight the following points a bit.

  • Is Normalisation is default==True when training model AND while inferencing on the edge device

  • Still issue with silence: could it be getting amplified due to normalisation leading to sporadic results? → way around this?

Thank you for the help!

Normalization is a term that gets thrown around a lot and means different things to different people in different contexts. I don’t think we’re doing the kind of normalization you’re talking about (I think you’re talking about scaling windows so that the max power is always 1.0 or something like that, yes?)

We don’t do this. The downside of this is that you want to train with a least a few samples across different expected volumes. The upside is you won’t have the issue you’re describing as long as you train with it…throw in a few samples with this sporadic silence you speak of. Additionally, you can play with the Noise Floor setting to make sure anything that you consider silence gets set to zero. (The noise floor setting simply zeros out any energy below the threshold you set)

If you do want to normalize so that each window always has a max power of 1.0 (or similar), you would need to do as a transformation block (or done before ingestion into Studio / before calling run_classifier during inference)