Question about regular vs continuous classification


We are trying to implement continuous audio classification and are running into a bit of confusion about the timing relationships between the two processes (sampling and inferencing) and their necessary buffers.

Our case is audio sampling in one second windows, so it aligns nicely with the provided documentation.

Let’s say with run_classifier(), our model is able to perform dsp + inference on one second of audio samples in under one second, but barely under one second, like anywhere from .7 to .9 seconds for the complete dsp and inference process. Also, let’s say we can only store one second of audio (plus one secondary buffer for a copy of that one second of audio) at any given time.

Considering this information from the continuous audio inferencing documentation:

Does this imply that the time it takes for our model to perform dsp and inferencing on a full second of audio will now be slowed down by a factor of four since it needs to pass through the FIFO pipeline and allow the impulse to run four times more every second?

Or is the inferencing process itself also broken up into four periods or segments, retaining the timing performance from run_classifier()?

If the former is true, then (using .7 seconds to run inference on a one second window of audio with regular classification as an example), we should expect continuous inferencing to now take 2.8 seconds (.7 seconds * 4 slices) for every one second of audio passed through the FIFO pipeline? Does this imply that if we cannot buffer more than three seconds of audio at a time before overwriting it with new samples every second, then continuous audio classification is simply not going to work?

Thank you

Hi @Markrubianes,

run_classifier() is a blocking call that will perform DSP and inference. So, if it takes 0.7 to run run_classifier(), the best you can do is fill up a buffer (e.g. using interrupts and DMA) during that 0.7 s and another 0.3 s before calling run_classifier() again. However, by doing this, you run into an issue if a keyword is spoken across multiple recordings, as each 1 second window is separate. It’s the same as using a 1 second window with a 1 second stride–no 2 windows overlap. It’s easy to miss keywords using this method.

To remedy this, you can use run_classifier_continuous(). To use run_classifier_continuous(), you should set EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW. In most applications, this will be set to 3 or 4. If it’s set to 4, then you should call run_classifier_continuou() every 0.25 seconds (assuming you’re working with a 1 second window).

This will take your 0.25 seconds of raw audio data and compute the MFCCs (or MFEs, if you’re working with non-voice data). This slice of MFCCs will be added the front of a buffer and the oldest 0.25 s slide of MFCCs will be dropped from the buffer (hence the FIFO in the description). What you have is a rolling window of 1 second’s worth of MFCCs. Each time run_classifier_continuous() is called, it updates this rolling window and also performs inference on the full 1-second window of MFCCs.

With this method, you can get inference results 4 times per second without needing to record a full 1-second’s worth of raw audio data (you just need to record 0.25 s audio slices, compute the MFCCs, and update the MFCC window).

So long as you can fill a 0.25 s audio buffer before calling the next run_classifier_continuous() (e.g. with interrupts and DMA) and run_classifier_continuous() takes less than 0.25 seconds, you can successfully run continuous keyword spotting on your system. If it takes longer than 0.25 seconds, you might need to change EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW to 2 or 3 to give the DSP and inference processing more time.

I hope that helps!

1 Like

@shawn_edgeimpulse, yes this certainly helps improve my understanding of the general algorithm. Thank you, and thanks for all your entertaining tutorial videos scattered throughout the internet. I’ve learned a lot from them over the past few years.

1 Like

Hi @Markrubianes,

Glad to hear it helps, and I’m happy that my videos have been useful!

1 Like