my goal is to classify percussive click sounds (about 0.1 seconds long and a broad frequency distribution from approx. 2 - 25 kHz) that show significant spectral differences in the second half of the signal.
Does it sound reasonable to look for features using MFE or spectrogram? Any initial recommendations?
Maybe @AlexE would d have a better opinion than mine regarding which DSP block to use
Other than that, I guess it won’t take much time to try out both and see where you get a better cluster separation between your classes.
Hi @Wolf normal spectrogram is probably best here. Most important for these types of short events is to be very strict and good at the labeling stage. https://www.edgeimpulse.com/blog/crop-split-data might be able to help if these are distinct events.
Thanks for your helpful comments. I was not aware of the crop and split functions - awesome!
Agree with Jan on not using MFE, since MFE buckets the spectral energy into bins based on the log of the frequency (to mimic human hearing). If you’re classifying based on the dominate tone (freq) in the second half of your recording, and the first half is uninteresting, then maybe start with Spectral Analysis and move on to Spectrogram if you don’t get good results.
Your FFT size will also be important here. Can you list out the frequencies that your clicks will be at and I can give you some guidance on the parameters for your DSP block? Also, what’s your sample rate?
thanks for your suggestions.
The more interesting frequencies of the second half of the signal are:
Original sample rate is at 96 kHz, but I aim to downsample to 44.1 kHz.
- Make sure you also downsample before your train the model in Studio
- 0.1 s @ 44.1 kHz is about 4,400 samples, so for spectral analysis I would do a 2048 fft length and the same for window size ( 2048 /44.1e3 ) = 46 mS, so maybe 0.045 s windows (the fft will zero pad out)
If that takes too much RAM, you could go smaller on the FFT (also reduce the window size to stay under the FFT length), but you’ll lose frequency resolution, so it depends on how close your sounds are in frequency