Separating audio that is very similar

I’m working on a “chicken translator” project to distinguish between 5 different chicken calls.
I’ve gotten my translator to distinguish between the most distinct calls, but the accuracy is poor for calls that are difficult for the human ear to distinguish.
Could choosing better MFC coefficients could help to distinguish between features more easily, or do I need to do more pre-processing on the data?
I’ve tried fooling with the coefficients, but everything I’ve tried so far has made it worse, so guidance here would be awesome.
Sorry if this is a broad question - I am new to audio processing!

Hi @zebular13, welcome to the community!

If it’s difficult for the human ear to distinguish between two calls, there’s a fair chance it may be difficult for a deep learning network, too. Here are some things to consider:

  • More data is always helpful; if you can obtain more samples for each call, your network will have a better chance at learning to distinguish between them—as long as it’s actually possible.

  • To better understand whether the samples are actually distinguishable by the model, try and make a bigger neural network with more layers and neurons (and thus more capacity to “memorize” data), and see if you can train it to get 100% on the training dataset by overfitting the data. This will at least help you understand if there is enough difference between the samples to make them distinguishable. If you can’t get the training accuracy to approximately 100%, there may not be enough difference between the samples for any network to be able to separate them.

  • The MFCC algorithm is actually designed with human hearing in mind. Since chicken hearing may be different, it could be worth investigating whether a spectrogram with different frequency bands or EQ might work better. This isn’t something we can do automatically in Edge Impulse right now, but if you can calculate these spectrograms yourself you can upload the data to Edge Impulse.

  • Similarly, the microphone you’ve used to collect your data may have a different frequency curve to a chicken’s hearing. Maybe there’s some information you are missing in the low end or high end of the signal, or in a currently quiet part of the frequency spectrum, and using a different microphone would reveal this.

Hopefully this gives some places to start!

Warmly,
Dan

1 Like

Thank you! I’ll try what you suggested and if I finish my project on Hackster I’ll post it here!

Excellent! I love chickens so I’m excited to hear it :slight_smile:

Out of interest, how large is your dataset? And are you attempting to distinguish between individual chickens or between different types of sound that a chicken makes?

Warmly,
Dan

  • The MFCC algorithm is actually designed with human hearing in mind. Since chicken hearing may be different, it could be worth investigating whether a spectrogram with different frequency bands or EQ might work better. This isn’t something we can do automatically in Edge Impulse right now, but if you can calculate these spectrograms yourself you can upload the data to Edge Impulse.

If you want to experiment with this, the easiest way to do this is by building a custom processing block based off our MFCC block (https://github.com/edgeimpulse/processing-blocks/tree/master/mfcc).

Thanks guys!
My dataset is about 9 minutes total. I’m using the longer audio segments from my dataset here: https://github.com/zebular13/ChickenLanguageDataset
Hopefully I can get something done over the long weekend!

Wow, this is an amazing dataset! I think 9 minutes is on the short side, so I’m confident you’ll see better performance with more data.

@zobular We now have released the MFE spectrogram block, would be great if we could see what effect that has on your accuracy! Need an audio spectrogram? The new MFE block is here to help!

@zebular13. This is so interesting, have you had any better luck? I never thought of a Chicken sounds dataset. Time for me to start looking for a dolphin sounds dataset.