Linear Scale Spectogram instead of MFCC

janvda · July 23, 2020, 9:08am

The MFCC building block is using Mel-frequency cepstrum which is based on how humans hear.

Using this building block makes sense for human speech but I think for other things like door bells, faucets, showers a linear scale (or other scale) might make more sense. So is there a building block that is implementing the spectogram on a linear scale ?

I am especially asking this as the linear spectogram of my doorbell in audacity looks more “distinguishable” than the mel spectogram as you can see in below screenshot for the same audio fragment:

Top = Mel
Bottom = Linear

Spectogram settings Top:

Spectogram settings Bottom:

janjongboom · July 23, 2020, 9:47am

Good suggestion, you can actually prototype this together already by building a custom processing block (https://docs.edgeimpulse.com/docs/custom-blocks), based off the MFCC block (https://github.com/edgeimpulse/processing-blocks/blob/master/mfcc/dsp.py).

But I think upping the coefficients would already give you some more info.

janvda · July 23, 2020, 9:54am

The audio fragment I have used:

https://raw.githubusercontent.com/janvda/node-red-doorbell/master/ring.01.1dof5osh.44100Hz.wav

janvda · July 24, 2020, 12:50pm

I learned thanks to @aurel that MFCC is a more advanced/complex feature extraction than the “Mel” spectogram scale in audacity.

I have found an interesting article about MFCC:

https://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

I am still thinking that MFCC is specifically designed for speech recognition. Which also means that it might not be the best tool to recognize other sound patterns. E.g. my doorbell shows a clear band around 9500 Herz which is beyond the range of typical speech analysis.

I think that for my case a feature extraction based on the linear spectogram scale (see audacity diagram above) would be a much better approach as I can easily recognize when the doorbell is ringing in these spectograms.

In terms of implementation I am wondering if the existing MFCC block can not be reused as starting point. In that case you basically need to remove steps and assure that the scale is linear instead of logarithmic.