Reducing normalization to handle silence better

I am trying to build a model to recognize the sounds of helicopters. When I deploy the model, I seem to be getting a lot of false positives from silence. I think this maybe because the silence gets normalized to be relatively loud, and when normalized the randomness of the silence sort of sound like a helicopter. Is there a way to turn normalization down so that the model is better able to separate quiet sounds from loud ones?

Is this the pre-emphasis setting in the MFCC? Are there settings in the Arduino inference script/lib I should adjust?

I was also going to try turning the gain down for the Mic in the Arduino script, it is set to the max.

It is also very possible my problem lies elsewhere, so let me know if this seem unlikely.

Doesn’t sound too logical, silence will have low values for the features. But I bet @dansitu has a better idea!

Hey Luke, hope you are doing well! :slight_smile:

I’m not an audio processing expert, but here are my thoughts. The sound of a helicopter contains regular periodic variations that the sound of “silence” (i.e. background noise) does not. For example, the “chop” of a rotor blade may happen every n milliseconds. The challenge here is to make sure our MFCC output represents these periodic variations in a way that is discernible from background noise.

The first think about is the low-frequency cutoff. By default, this is set to 300Hz. If your helicopter is making sounds below this frequency, they will be filtered out. I’d start by reducing this value and seeing if your results change.

Secondly is the MFCC output’s resolution. The default parameters for the MFCC block have a frame length and frame stride of 0.02 seconds (20ms). This means that each column of the MFCC represents 20ms of sound. If the “chop” of the rotor blades happens faster than every 20ms, it may not be distinguishable from a constant background “hum”.

So, I would perhaps try and increase the resolution of your MFCC output by reducing the frame length and frame stride. This might result in a an output that can be more easily distinguished from background noise. Of course, the larger MFCC output will require more memory and compute, but you could maybe get away with reducing the overall length of the window if this is a problem.

Give these a try and let me know how it goes!

Warmly,
Dan

1 Like

Hi Dan - Likewise!!

You are much more of an audio expert than me.That makes a lot of sense, I am going to give that a try. I did try experimenting with sending in normalized vs the audio I am capturing right off the Arduino board (which is very quiet) and the MFCC process does not seem to be impacting by the volume levels of the samples you are training on. I will post back on how it goes.

UI Feedback - having the Spectragams update as you adjust the MFCC parameters has been super helpful. It only seems to work on Chrome though, it isn’t working for me on Safari. It would be super awesome to get some sort of memory/compute feedback too as you adjust parameters.

Thanks for the feedback, we’ll be adding this very soon!

1 Like