6. (MFCC) Conclusion
Although reporting an excellent training accuracy of 99.9% but when testing on other audio fragments it classified many “non-ring” sounds as “ring”. This makes it not really usable to detect “rings” due to too many false positives.
In response to this post, I will share the MFE results…
@janvda Really hopeful results! You’re still overfitting by a bit, maybe a dropout layer of 0.1 after flatten would help? We’re also adding some data augmentation to audio data next week, that should help with smaller datasets.
Some other tips that would help on the inferencing side to make this robust:
If you collect a second of data, don’t classify once, but use the sliding window approach to classify a bunch of slices of the data.
If you see >70% ring windows this is a very strong indication that there was a ring.
Together that will probably get you a deployment with barely any false positives / negatives.
Yes, I understand. So one “ring” classification is not sufficient to conclude the doorbell rings, for that you must have at least 2 “ring” classifications in a row (or even more complex conditions like at least 3 ring classifications in a series of 5 subsequent classifications).
Of course we must take care that a ring can be short (a few 100ms long) - so we must assure that multiple classifications are done covering any period X (where X is the minimal ring period we want to be able to recognize).