I would like to use the K-means block for anomaly detection on audio data, however, it is not showing up as an option for me. When I go to add a new learning block, all I can add is a classification block and a regression block and this message also shows up “Some learning blocks have been hidden based on the data in your project.” There is no information as to why I can’t add a K-means block so some help with understanding why would be nice.
If it helps in diagnosing the problem, my target board is the Arduino Nano 33 BLE Sense. I have two classes, an MFE processing block, 6 seconds (32 samples) of training data for one class, 4 seconds (4 samples) for the other (noise).
Ok good to know! Thank you. I have a related question then. What I am ultimately trying to do is one class classification where the model is trained to identify one sound and that’s it. I tried just training with one class, and when I did, everything got labeled as that sound (noise included). I then tried adding a noise class, which made it so that all sounds got labeled as the sound, and when it was quiet, it got labeled noise. This is why I wanted to try anomaly detection with my model.
Any recommendations on how to go about doing this? I can add lots of other classes to my model, but I am now curious to know if there is another way, or if possibly Edge Impulse is not suited for this application.
Hi @emladina, this is a good question, thanks for your post!
You’re correct in identifying that you’ll need a “noise” class in order to distinguish the sound you care about from general background noise. Otherwise, since the output of a classification model is a probability distribution across all the known classes, you’ll just get a single “1.0” as your model’s output.
In your application, when your model predicts noise you can just ignore it.
Here are some tips on getting the noise class to work well:
Collect roughly as much data for each class (e.g. 50/50 between the sound you care about and general background noise). Make sure you have the same balance in your training and test datasets.
Include lots of different types of background noise in the background noise class—try to include all the types of background noise that might reasonably occur in the place you’re deploying your model
Make sure you have samples with various volumes so that your model doesn’t just learn that “noise” is the quiet one.
To get you started, here are some of my go-to quick sources of background noise:
Thanks for the response! Good to know that my noise class should not just include quiet. I am trying to specifically listen out for asthmatic sounds (snorts) made by a dog, so I will likely need to add common household sounds as well as other dog sounds (barking, play sounds, etc.).
I’ll take a look at those datasets and try adding them to my model.
As a follow up question, does it affect model performance if I just have two classes as opposed to more? So if I have my target sound (asthmatic sounds) in one class and all other sounds in a noise class, will that perform differently than a model where I have those two classes, but in addition, sounds that might be similar to asthmatic sounds are given their own label? To be clear, I wouldn’t have extra data, just that as opposed to labeling similar sounds as noise, I give them their own label. Or would these two models perform the same? I apologize if this is more of a broader ML question as opposed to a strictly Edge Impulse one.
No need to apologize, this is a great question! In theory there shouldn’t be a huge difference between having the various noise types merged into one class vs. in distinct classes—the models should behave quite similarly.