I am working on training an audio classification and for that, I am using either Spectrograms or MFE as an Impulse. However, I’d like to ask if Edge Impulse on the backend normalizes the sound samples while training or while generating features on Impulse. For example, we might have a sound sample but the amplitude/frequency of the sample may be very low and either Edge Impulse automatically adjusts the Amplitude/Frequency or normalizes the sound to match with the rest of the sound samples including the following:
How to deal with sporadic silence inferencing outputs
If the data is normalised for training and how? (per file or over all files?)
You can normalize your values either in the pre-processing block or in the learning block.
For the DSP blocks:
In Spectrograms and MFE, we have a way to do normalization in the default pre-processing blocks using the noise floor but you can customize this using the custom DSP blocks.
In the MFCC block, you can you the Normalization Window size parameter: The size of sliding window for local cepstral mean normalization. Windows size must be odd.
However, I an not sure this is exactly what you’re looking for. In that case you can use the custom DSP blocks to modify the behaviour of the blocks: See this thread.
In all cases in the DSP blocks, the normalization will be applied per window size.
Note, if you use the custom DSP blocks option, you’ll need to implement the modification of the cpp code as well. We have examples on how we implemented that for raw and flatten DSP blocks. This can easily be done by adding your custom function in your main.cpp.
In both cases, during inference, the normalization will be applied (either with the NN weights or with through your DSP config).
Normalization is a term that gets thrown around a lot and means different things to different people in different contexts. I don’t think we’re doing the kind of normalization you’re talking about (I think you’re talking about scaling windows so that the max power is always 1.0 or something like that, yes?)
We don’t do this. The downside of this is that you want to train with a least a few samples across different expected volumes. The upside is you won’t have the issue you’re describing as long as you train with it…throw in a few samples with this sporadic silence you speak of. Additionally, you can play with the Noise Floor setting to make sure anything that you consider silence gets set to zero. (The noise floor setting simply zeros out any energy below the threshold you set)
If you do want to normalize so that each window always has a max power of 1.0 (or similar), you would need to do as a transformation block (or done before ingestion into Studio / before calling run_classifier during inference)