Compound keywords

Is there a recommended approach to doing compound keyword recognition? For example, if the keyword is “light” and then it could be followed by “on” or “off”, should those be distinct recognition events in a sort of state machine or should “light on” and “light off” be events? Or is it left as an exercise for the reader? :slight_smile:

@jefffhaynes @dansitu can chip in here, but yeah state machine would work. We typically deploy this anyway, e.g. for a keyword you’re looking for something akin to “noise”, “keyword”, “keyword”, “noise”.

See e.g. here for a production pipeline doing this: (not really meant to be reusable, but hopefully you get the idea).

Ok, I’m probably missing something obvious but shouldn’t the print statements in the run_nn_continuous loop reflect a similar thing? If I say “light on” during that loop I get good probability of “light” but virtually nothing for “on”. Is that loop not semantically equivalent to what you referenced? Thanks

Be aware that run_nn_continuous does not print every prediction. It does print once a second I think (even though it runs more often), and then has the moving average filter (MAF) over it to smooth things out. With two keywords in quick succession you probably want to inspect every result by skipping over the (if ++print_results condition), and disable MAF. I’d expect to see on a little later (when it’s at the beginning of the sample, so once light is out of the spectrogram).

Yes, I should have said, I disabled the condition. In that case you would expect them to be similar?

I just tried continuous classification on the site as well and I’m not getting reliable hits on the adjacent words so I suspect something is not right with my model.

@jefffhaynes You have separate keywords in Edge Impulse now for both “light” and “on” ? If you use Live classification / Model testing with a one second clip with just “on” does it classify correctly?

Sorry, completely swamped this week. Those aren’t the actual words but yes, it works independently with “on”. I can certainly point you to the project if you’re able to take a look. I just don’t want to waste your time before I’ve done due diligence but I haven’t had time yet. Thanks.

I’ve added more training data and it’s working better now. I think I just need to beef up the model. Thanks!

I’m now getting very good performance from the live classification but mediocre performance from my nRF52840. My belief is that the live classifier is essentially using a slices per windows value of two and no running average. Is that assumption correct? Is there anything else I can look at?

EDIT: pretty sure I’m missing something. I’m just not getting nearly the performance out of the board as with the online classifier, even for single words. The board is having no trouble keeping up, even at 8 slices per window but nothing seems to make a difference. Occasionally the probability is great, but more typical is 0.5 or less. Again, I am running with no averaging and printing out every slice, not just at the window.

EDIT2: Sorry for all the questions, but if I have four slices per window, does each slice in the loop contain the analysis for one second of data? Or does it contain the analysis for 0.25 seconds of data? I was under the impression that it contained one second, but it very much seems like 0.25 based on my testing. Whereas the live classification is obviously one whole second.

Hi @jefffhaynes, the ‘slices per windows’ value is just used on device, and it refers to how we calculate parts of the spectrogram (with a value of 4, we calculate 250ms. slices of the spectrogram and stitch them back together) to avoid having to do the full calculation. On live classification we don’t use this, we look at the sample and classify it. If your sample is longer than the window size (as defined in Create impulse) you can see multiple classifications, but it’s unrelated to how the slices per windows is calculated.

does each slice in the loop contain the analysis for one second of data?

Yes, this is the full one second of data.

I’m just not getting nearly the performance out of the board as with the online classifier, even for single words.

What’s your project ID? I’ll take a look on my nRF52840 DK.

Thanks, I greatly appreciate it. 21792

You can use test.21rj9k3b to see the performance of the compound classification

I think what is really throwing me off is that the probabilities are very clearly lower with a greater number of slices per window. But if I’m understanding you correctly, the number of slices should not directly affect the probability in the sense that the probability is not being “divided” across the slices. However, the impression that I get is that for more slices the probability is watered down. Again, this is all with the moving average commented out.

Hi @jefffhaynes, sorry for the late reply but you’ve really pulled me into a rabbit hole the last few days :smiley: The underlying issue is that there is a discrepancy in the spectrogram calculation in continuous mode versus normal classification mode, and we have a proper reproduction path now.

E.g. this is the first 3 seconds of your test.21rj9k3b file classified with both:

one_sec_standalone     0 - 16000: [down: 0.00018, loupelight: 0.00000, noise: 0.08634, unknown: 0.91347, up: 0.00001]
one_sec_standalone  8000 - 24000: [down: 0.00000, loupelight: 0.99862, noise: 0.00003, unknown: 0.00135, up: 0.00000]
one_sec_standalone 16000 - 32000: [down: 0.00000, loupelight: 0.97226, noise: 0.01524, unknown: 0.01250, up: 0.00000]
one_sec_standalone 24000 - 40000: [down: 0.00551, loupelight: 0.00001, noise: 0.00003, unknown: 0.12931, up: 0.86514]
one_sec_standalone 32000 - 48000: [down: 0.00000, loupelight: 0.00001, noise: 0.00274, unknown: 0.99600, up: 0.00125]
one_sec_continuous     0 - 16000: [down: 0.00084, loupelight: 0.00000, noise: 0.23759, unknown: 0.76153, up: 0.00004]
one_sec_continuous  8000 - 24000: [down: 0.00000, loupelight: 0.99979, noise: 0.00002, unknown: 0.00019, up: 0.00000]
one_sec_continuous 16000 - 32000: [down: 0.00000, loupelight: 0.99143, noise: 0.00480, unknown: 0.00378, up: 0.00000]
one_sec_continuous 24000 - 40000: [down: 0.03030, loupelight: 0.00002, noise: 0.00017, unknown: 0.35133, up: 0.61818]
one_sec_continuous 32000 - 48000: [down: 0.00131, loupelight: 0.00033, noise: 0.05965, unknown: 0.78938, up: 0.14934]

The actual effect is a bit random, but in this case up is not picked up, even though it should have been in continuous mode.

I’ve filed a bug with the SDK team, and hopefully will get this resolved quickly.

No worries, that’s great news! Well, you know what I mean…

Let me know if I can help at all. I was getting ready to dive into the sdk deeper myself :slight_smile:

If you can, please let me know if there’s something I can try out. Thanks

Not yet, we’re planning to have a fix somewhere this week.

1 Like

Any luck on this? We’re going down this path with our design but if the continuous recognition can’t work or if the processing required is prohibitive it would be good to know sooner rather than later. Sorry for bugging you. Thanks!

@jefffhaynes we’re still working on this (@Arjan) - we expect a fix in the next days.

No worries, I’m sure you’re busy. Apparently you’re famous now! :smile:

Hi, we have a patch ready now, will go through review this week and to be released in the next SDK release.