Hello, I have been experimenting with edge impulse multi-labeling (Multi-label - Edge Impulse Documentation)
But I’m getting terrible results with audio event classifier that works perfectly with exactly the same data, model, window size when using separated audio samples.
Multi-labeling:
Error: 27.3% (215 / 787) Actual label: sneeze. Predicted label: other
Separate samples:
Error: 1.1% (2 / 174) Actual label: sneeze. Predicted label: other
I analyzed the results and multi-labeling provides wrong statistics because it does mislabeling with all of the configurations.
I will explain the problem with each setting:
-
Use label at the end of the window
Docs: works well for scenarios where the primary interest lies in the resulting state or activity of the window such as recognizing sustained motions or transitions
This will result to many misclassifications, as it will classify based on end of the window. So even minor amount of the next class at the end of the window will label the whole window to that event.
-
Use label X if anywhere present in the window
Docs: useful for detecting short or sparse events that may not occupy the full window but are critical to capture when they occur.- When I select “sneeze”
It will misclassify the whole window that has even a tiny amount of sneeze to sneeze (5/95, 10/90, 20/80 …) - When I select “other”
It will misclassify the whole window that has even a tiny amount of other to other (5/95, 10/90, 20/80 …)
- When I select “sneeze”
So no matter what I select, I will get devastating amount of misclassification. I understand that I can manually inspect that the model still works, but I need statistics to be reliable, so I can evaluate models by the numbers and not by feeling.
My suggestion:
Instead of checkbox ON/OFF coarse selection, allow user to enter percentage value that must be within the window to classify it.
I would try to set 70% to sneeze. That means that window must occupy at least 70% sneeze in order to qualify to be classified as sneeze. Not sure if it fixes the issue, but I think so. The checkbox method essentially sets this value to either 0% or 100%, which will lead to misclassifying either way.
Edit:
I noticed I can check both of the checkboxes simultaneously. I assume it means 50% chance to classify either one. That improved the results, but I manually checked the errors and they are mostly mislabeling.
Error: 9.3% (38 / 407) Actual label: sneeze. Predicted label: other