K-means not showing up as a learning block option

emladina · October 26, 2021, 4:24am

I would like to use the K-means block for anomaly detection on audio data, however, it is not showing up as an option for me. When I go to add a new learning block, all I can add is a classification block and a regression block and this message also shows up “Some learning blocks have been hidden based on the data in your project.” There is no information as to why I can’t add a K-means block so some help with understanding why would be nice.

If it helps in diagnosing the problem, my target board is the Arduino Nano 33 BLE Sense. I have two classes, an MFE processing block, 6 seconds (32 samples) of training data for one class, 4 seconds (4 samples) for the other (noise).

janjongboom · October 26, 2021, 6:16am

Hi @emladina K-means blocks are not supported for audio. The number of features is so high that it’s not practical to do anomaly detection on it this way.

emladina · October 26, 2021, 5:09pm

Ok good to know! Thank you. I have a related question then. What I am ultimately trying to do is one class classification where the model is trained to identify one sound and that’s it. I tried just training with one class, and when I did, everything got labeled as that sound (noise included). I then tried adding a noise class, which made it so that all sounds got labeled as the sound, and when it was quiet, it got labeled noise. This is why I wanted to try anomaly detection with my model.

Any recommendations on how to go about doing this? I can add lots of other classes to my model, but I am now curious to know if there is another way, or if possibly Edge Impulse is not suited for this application.

dansitu · October 26, 2021, 6:59pm

Hi @emladina, this is a good question, thanks for your post!

You’re correct in identifying that you’ll need a “noise” class in order to distinguish the sound you care about from general background noise. Otherwise, since the output of a classification model is a probability distribution across all the known classes, you’ll just get a single “1.0” as your model’s output.

In your application, when your model predicts noise you can just ignore it.

Here are some tips on getting the noise class to work well:

Collect roughly as much data for each class (e.g. 50/50 between the sound you care about and general background noise). Make sure you have the same balance in your training and test datasets.
Include lots of different types of background noise in the background noise class—try to include all the types of background noise that might reasonably occur in the place you’re deploying your model
Make sure you have samples with various volumes so that your model doesn’t just learn that “noise” is the quiet one.
To get you started, here are some of my go-to quick sources of background noise:
- This dataset has some too https://github.com/microsoft/MS-SNSD/tree/master/noise_train
- Google’s Speech Commands Dataset: https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html
- It’s also super easy to capture some data from your device—just record for a few minutes in your target environment and then let Edge Impulse’s windowing split up the data for you

I hope this helps—let me know if you have any questions!

Warmly,
Dan

emladina · October 26, 2021, 7:59pm

Hi Dan,

Thanks for the response! Good to know that my noise class should not just include quiet. I am trying to specifically listen out for asthmatic sounds (snorts) made by a dog, so I will likely need to add common household sounds as well as other dog sounds (barking, play sounds, etc.).

I’ll take a look at those datasets and try adding them to my model.

As a follow up question, does it affect model performance if I just have two classes as opposed to more? So if I have my target sound (asthmatic sounds) in one class and all other sounds in a noise class, will that perform differently than a model where I have those two classes, but in addition, sounds that might be similar to asthmatic sounds are given their own label? To be clear, I wouldn’t have extra data, just that as opposed to labeling similar sounds as noise, I give them their own label. Or would these two models perform the same? I apologize if this is more of a broader ML question as opposed to a strictly Edge Impulse one.

Evan

dansitu · October 26, 2021, 9:10pm

Hi Evan,

No need to apologize, this is a great question! In theory there shouldn’t be a huge difference between having the various noise types merged into one class vs. in distinct classes—the models should behave quite similarly.

Warmly,
Dan

emladina · October 26, 2021, 11:00pm

Hi Dan,

Good to know!

Thanks again,
Evan

dansitu · October 26, 2021, 11:16pm

No problem, let me know if anything else comes up. Good luck with your project!

Warmly,
Dan