Question/Issue:
Unable to support more than one keyword in audio classification impulse.
I have just started with Edge Impulse and am finding it difficult to use. The tutorial leads to a working impulse for one keyword and I can run it in the browser and in my target hardware. I have written code in the target to parse the impulse output and take appropriate actions. In my simple POC project I need only two keyword for a trivial function of turning a light on and off.
I have not been able to add the second keyword to the tutorial-generated impulse. I have gone through the loop a few times including deleting everything and redoing the tutorial.
I made notes of the steps in making an impulse during the second run through of the tutorial and then repeated them for a new keyword (label). Everything seemed to work but the features for the two labels overlap and regrdless of which keyword I say, the new keyword is always detected and not the original keyword. The two keywords are:
“lights on parker”
“lights off parker”
I said it was trivial.
Is it really the case that these two keywords are too similar? If so, this is going to be very limiting. Any suggestions of what to do?
Those two commands are very similar (about 80% of the phrase is the same):
“lights on parker”
“lights off parker”
If that overlap is obvious to us, it will be difficult for the model too, especially with a limited dataset. The model is being asked to learn a very small difference between on and off while most of the audio remains the same.
You should also include a background class so the model has an appropriate option when the input is not one of your keywords. For example:
on
off
noise / unknown
The noise / unknown class is important because otherwise the model is forced to choose between your keyword classes even when the audio does not really match either one.
A better approach here may be to use “Parker” as the activation word, and then use a second stage to parse the action words from the following speech. Keyword spotting is generally better suited to detecting a short wake word than treating the entire command as a single keyword.
In your case, a more robust design would be:
wake word: Parker
action parsing stage: detect or transcribe on / off
This guide also has some useful tips on improving model performance: