Anomaly detection on audio data

I have a project where I am detecting error states of a fan. Normal, zip tie in the fan, banging with a hammer at periodic intervals, and banging with a small screwdriver at periodic intervals to begin with.

I am running audio inference on a 1 second window at a time over a 10 second recording. The point of this is that new error states may present themselves temporally differently and the DSP should be suitable for that possibility. (some error states may be periodic bangs, others may be consistent change in pitch, etc.) Basically, I am trying to cast the net wide because I don’t know what errors will occur or how they will present themselves.

The idea is to have it automatically detect new possible error states and notify me so that I can retrain on various conditions (some potentially being new classifications), using the anomaly detection.

Since the window size is a little large, I have used the MFE DSP with default settings but a FFT length of 512 to pick up more discreet changes in the pitch or fast bangs over the second. This seems to work surprisingly well with the spectrogram and with the NN. The issue is with the anomaly detection.

The output is comprised of a few thousand “audio features”. When I select all features, the bubbles created exceed the size of the plotted points by a huge margin. I am only seeing the “cluster count” and “minimum score” settings, neither of which fix the issue. I have tried with different DSP settings to no avail. It is difficult to determine what the issues may even be here or how to address them, and I haven’t seen many forum posts or much documentation on this. Any info or suggestions would be much appreciated!

@Kerkenstuff, the anomaly detection block does not work well with high-dimensional data like the spectrograms. My suggestion would be to train a classifier if you already have ‘normal’ vs. ‘fault state’ features.

I have some for my test case, and the classifier works well. But the real value to the application is to detect unclassified anomalies. The system will run autonomously and far from people, so it is important to be able to notice issues without human intervention and signal me remotely. For example, if a leaf were to get caught in the fan, the NN would just classify it randomly and not notice that it is a new state.

Has anyone considered, an autoencoder for anomaly detection on high-dimensional data?

Unless you are meaning to create a NN that is trained between “normal” and “general not normal” states. Which might actually work with a large enough dataset if there is enough variance in normal and non normal states to generalize? I have never heard of that method. But would that be great if it could work as a general method of anomaly detection

@Kerkenstuff Check. Audio anomaly detection is really hard. We’ve had someone work on autoencoders for audio over the summer and the results were not great unfortunately.

@dansitu could you lend some advice on how to approach this best?

Hi @Kerkenstuff,

As @janjongboom mentions, this is a tricky problem! Using Edge Impulse, I would recommend trying the following approach:

  1. Use your existing DSP and NN configuration to distinguish between normal operation and known error states that you can collect training data for.
  2. Add another DSP block that extracts some simple high-level features. For example, you could use our “Spectral features” block. I was able to get decent results for audio by setting the filter type to “none”.
  3. Set this new DSP block as the input to the “Anomaly detection” block, but make sure your NN block only has the original DSP as its input.
  4. Play with the configuration of the DSP block until you find a combination that gives reasonable clusters. For example, here’s a screenshot of the results from one of my own projects that appears promising:

This should help detect some out-of-distribution anomalies, so perhaps in conjunction with your classifier detecting “known” anomaly types it will get good enough performance for your application.