I’m tinkering with anomaly detection on raw signal: I generate random values in a limited range using the data forwarder, to find out if the model will detect anomalies in a similar signal with a different value spread (e.g. instead of random values between 10 and 20, there might be some peaks going as far up as 25 and as far down as 5, etc.).
I would like to understand better what the k-means anomaly detection algorithm is doing exactly, explore the features that are generated from the signal, etc. Would it be possible to have a similar option like in the Neural Networks block, where you can see the Keras model and also download an iPython notebook?
Hi @nebelgrau77, this is what we do for anomaly detection:
- We take all features that you select, and normalize them with a StandardScaler.
- We then run Kmeans clustering over this feature space with the number of clusters provided.
- For all clusters found we determine the center of the cluster, and the radius (distance from center => farthest point that still falls in the cluster). You see these printed in the output window at the end of training.
During classification we normalize the data again, then look at the closest cluster to the incoming data, and the anomaly score is the distance to the closest cluster (score <0 means it’s in a cluster).
I agree that an expert mode for this would be useful, but I’d like to replace this with a neural network based approach to anomaly detection at some point which will give us that automatically (e.g. using autoencoders or something).
2 Likes
Thanks for the details! The NN-based approach sounds good, but maybe keep the Kmeans as an option, as it’s could be a faster option (maybe)?
1 Like