Im having some troubles with my keyword detection model running on a microbit. It triggers fine when i speak the keyword, but it also triggers sometimes when people are making noises and laughing in the background. Generally it happens when people are speaking loudly and vividly or making silly noises (which my classmates do…a lot… )
Any tips on how I can improve this? Replace the “other” voice samples with swedish recordings instead of the english prebuilt pack downloaded from the Edge impulse tutorial?
Ah, yes. It is unfortunately full as this project is the same as the one I am getting OOM errors in. So I most likely need better data then more data…? Would it make sense to redistribute the data somewhat? I have about 1200s of my keyword, and 900s each of unknown, and noise.
Also, would you put a group of people talking at the same time in unknown or noise category?
Yes adding some data close to what is badly predicted would help.
I would put that in the noise category.
You could also try to group your two classes noise and unknown as one category to see if there is any improvements.
And you can also try is to adjust the DSP parameters.
Can you share your project ID so I can have a quick look?
From what I see on the confusion matrix (both in validation and testing), indeed, some of the unknown words are recognize as your keyword. about 3% in the validation dataset and 6% in the test dataset.
I also noticed that some of your words are not correctly labelled (did not check all your dataset but just randomly some of your keywords and I found some mistakes).
One other thing to make your model more robust is to have a clean dataset so your NN does not learn on mistakes.