Feature selection for click sound

Wolf · June 8, 2021, 7:50am

Hi,
my goal is to classify percussive click sounds (about 0.1 seconds long and a broad frequency distribution from approx. 2 - 25 kHz) that show significant spectral differences in the second half of the signal.
Does it sound reasonable to look for features using MFE or spectrogram? Any initial recommendations?
Cheers!

louis · June 8, 2021, 8:19am

Hello @Wolf,

Maybe @AlexE would d have a better opinion than mine regarding which DSP block to use
Other than that, I guess it won’t take much time to try out both and see where you get a better cluster separation between your classes.

Regards,

Louis

janjongboom · June 8, 2021, 8:54am

Hi @Wolf normal spectrogram is probably best here. Most important for these types of short events is to be very strict and good at the labeling stage. https://www.edgeimpulse.com/blog/crop-split-data might be able to help if these are distinct events.

Wolf · June 8, 2021, 5:25pm

Thanks for your helpful comments. I was not aware of the crop and split functions - awesome!

AlexE · June 9, 2021, 2:02pm

Hi Wolf,

Agree with Jan on not using MFE, since MFE buckets the spectral energy into bins based on the log of the frequency (to mimic human hearing). If you’re classifying based on the dominate tone (freq) in the second half of your recording, and the first half is uninteresting, then maybe start with Spectral Analysis and move on to Spectrogram if you don’t get good results.

Your FFT size will also be important here. Can you list out the frequencies that your clicks will be at and I can give you some guidance on the parameters for your DSP block? Also, what’s your sample rate?

Wolf · June 11, 2021, 12:58pm

Hi AlexE,

thanks for your suggestions.
The more interesting frequencies of the second half of the signal are:
7-10 kHz
15-18 kHz
20-21 kHz
Original sample rate is at 96 kHz, but I aim to downsample to 44.1 kHz.

AlexE · June 11, 2021, 2:02pm

Make sure you also downsample before your train the model in Studio
0.1 s @ 44.1 kHz is about 4,400 samples, so for spectral analysis I would do a 2048 fft length and the same for window size ( 2048 /44.1e3 ) = 46 mS, so maybe 0.045 s windows (the fft will zero pad out)
If that takes too much RAM, you could go smaller on the FFT (also reduce the window size to stay under the FFT length), but you’ll lose frequency resolution, so it depends on how close your sounds are in frequency

janjongboom · July 7, 2021, 3:38pm

Note that we’ve released new versions of our MFE and spectrogram blocks today. They’ll perform even better on these type of tasks: We reworked our DSP blocks for non-voice audio. We’ve seen 7% point increase in accuracy, while using 33% less RAM. Perfect for classifying animal sounds and detecting security breaches in realtime on any MCU or embedded Linux device. https://www.edgeimpulse.com/blog/even-better-audio-classification-with-our-new-dsp-blocks