Live classification unregularity


I am playing with a audio recognition model that is supposed to recognize blowing sounds from a human.
I have 3 classes, a “cold” blow, “hot” blow and non blow.

The model is trained using a dataset of different persons performing these blowing commands. Dataset consists of around 3 minutes of audio for each command. The non blows have more audio consisting of background noise picked up by the microphone and some random clips from the EC-50 dataset.

I am using a NN classifier that gets MFE features. The window size for the audio is 300ms with 150ms increase. The MFE has 0.02 frame length and 0.01 stride, 50 filters and 1024 FFT length.
After training with 100 cycles and data augmentation on, the model has 99.8% accuracy.

Retesting the model with some various other clips via the model testing tab gives 93% accuracy.
So far so good.
But here is where the weird things happen.

When i record an audio clip with blowing commands, upload this as a test fragment. And classify this via the live classification of an existing test sample. The result is as expected, the classifier sees all blowing commands.
But when i play the same clip via my computer using Voicemeeter to output the audio via a virtual microphone and then classifying this via the real-time live classification running on my computer in Chrome it always classifies it as nonBlow command.
I see that it reacts a little bit, that is, changing the confidence level of the nonBlow class. But this is minimal only lowering it by about 0.1.
I have tried amplifying the signal in Voicemeeter, this improved things a bit but it is still much less reliable than the classification of an existing sample.
I have also tried multiple computers and a mobile phone, all showing the same result.
Even when not using Voicemeeter but instead using a real microphone the result is still not comparable.

Can someone maybe shed some light on why i get different results from live classification as opposed to classification of an existing test sample file? I can’t seem to figure out why the results are different.

Thanks in advance.

Hi @Impulz,

I’m not familiar with Voicemeeter but there may be some audio transcoding involved using a Virtual Mic.
To get a better idea of the issue, could you collect a new test sample using your virtual mic in Chrome, and then classify it in Live Classification?
If it’s misclassified, it could be good to have a look at the wav file, feel free to share your project ID here or in PM.


I will PM you some files and the project ID.
During recording i noticed that the hot blow command is more easily classified than the cold blow, and classification between the 2 commands is also often mixed up. Mainly cold blows being recognised as hot blows.
This is just a problem with the commands and dataset, and not Edge Impulse.

However, as you can see in the video i sent you. There is a difference between the live classification and pre-recorded classification.