Now available: Sketch for continuous audio classification on Arduino

By popular demand: we now have a sketch for doing continuous audio classification. This should make it trivial to make your Arduino respond to audible events around you, like keywords, birds tweeting, alarms going off etc. Because this runs continuous you’ll never miss an event. To grab the new example go to Deployment and select Arduino library and import in the Arduino IDE. The new sketch will be listed under ‘Examples’ and is named nano_ble33_sense_audio_continuous. As the name implies this was built for the Nano 33 BLE Sense, but this will work on other targets that are compatible with the PDM library too.

To make your device do something when it hears f.e. a keyword take a look at the loop function. F.e. you can add this:

bool heard_keyword = false;
for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) {
    if (strcmp(result.classification[ix].label, "edgeimpulse") == 0 && 
        result.classification[ix].value > 0.7) {

        heard_keyword = true;
    }
}

if (heard_keyword) {
    digitalWrite(13, HIGH);
}
else {
    digitalWrite(13, LOW);
}

:rocket:

@timlester @Andrew923 @Robotastic, I know you’ve been wanting this!

2 Likes

@janjongboom indeed thanks. I had it working somewhat before, but it’ll be nice to see how to do it the right way with the new deployment.

1 Like

This is great! It is a much cleaner implementation than what I did and it flagged some thing interesting. The DSP & Classification processing time is much longer than the slice length. The weird part is that it takes almost much time to process a slice as it does to process a full sample.

I am working with 2 second samples. I have left it the default 3 slices per model window.

When I run the standard audio inferencing example, that is not continuous, I get the following timings:
Predictions (DSP: 1461 ms., Classification: 417 ms., Anomaly: 0 ms.):

When I run the continuous example, I get the following timings:
Predictions (DSP: 1079 ms., Classification: 418 ms., Anomaly: 0 ms.):

I double checked the sizing of different definitions:
Inferencing settings:
Interval: 0.06 ms.
Frame size: 32000
EI_CLASSIFIER_SLICE_SIZE: 10666
EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW: 3
EI_CLASSIFIER_RAW_SAMPLE_COUNT: 32000
Sample length: 2000 ms.
No. of classes: 2

I would have expected the DSP/Classifying times to be about 1/3 since it is processing 1/3 of the samples. Is there some fixed overhead for both? I also timed the loop() to double check the timing numbers in the results structure and they seem to be measured correctly.

I am just using too complex a DSP / NN to be able to do continuous processing? Or is there something in run_classifier_continuous() that is not adjusting to the slice size?

@Robotastic there’s significant time spent on normalising the feature array which always needs to be done on the full 2 seconds of data, so doing it sliced does not reduce the time spent by as much as you’d think. Quick tip would be to disable filter bank quantisation (see config.hpp) that will save some time.

At some point I want a fixed point version of the mfe and MFCC blocks which will speed up DSP process on audio but we’re not there yet.

Note that MFE uses a much simpler normalisation process so that could be a quick fix.

Actually if you need the 2 seconds of data might be easier to downsample data to 8khz. Typically good results and only half the processing time (does require converting and reuploading your audio data though, that would be a cool feature to have in Studio).

edit: based on inference times it seems that cmsis-nn is not loaded. What target are you running on?

Interesting! I will give both quantization a try and also try 8khz. I will try just downsampling first in Audacity and see if everything is still clear and I can pick out the siren. I will also back the accuracy from 95%+ down to the 80% range, so it will be quicker. Running a slight worse model multiple times might be better than a high quality one once. MFE maxed out around 80% accuracy for me, so that might be a good first thing to try.

I did a little testing - using 8KHz could be promising. I was a able to train a decent model using it. One snag though is that PDM on the nRF52840 only does 16khz. It doesn’t look like the chips supports 8khz https://github.com/arduino/ArduinoCore-nRF528x-mbedos/blob/master/libraries/PDM/src/PDM.cpp#L58. I might look at how to do the conversion in code.

I was able to get continuous mode to work if I did a very simple DSP and set the sample size to 1 sec. I will give it a try in the real world to see how it compares. I also tried to use MFE instead, but it didn’t make a big difference in processing time.

@Robotastic here’s a basic downsample algorithm: https://github.com/edgeimpulse/mobile-client/blob/master/public/assets/recorder.js#L171

Excellent! I was going to make it more complicated and pass it through a low-pass filter first to remove aliasing… but probably a waste since it was just going into an algo.

Switching to 8k, I was able to use 2 second sample and more complex DSP processing. An MFCC, with 20 coefficients, 64 filters and a 512 FFT len, takes ~760ms on a 666ms slice. A little over, but not too bad.

I change the sample_buffer allocation so it is always for 16khz, since that is what is native off the mic.

    float sample_length_sec = (float) n_samples / (float) EI_CLASSIFIER_FREQUENCY;
    int microphone_sample_count = int(16000 * sample_length_sec);
    sampleBuffer = (signed short *)malloc((microphone_sample_count >> 1) * sizeof(signed short));

I also change the PDM callback to downsample as it copies into the inference buffer:

static void pdm_data_ready_inference_callback(void)
{
    int bytesAvailable = PDM.available();

    if (EI_CLASSIFIER_FREQUENCY == 16000) {
        // read into the sample buffer
        int bytesRead = PDM.read((char *)&sampleBuffer[0], bytesAvailable);
    
        if (record_ready == true) {
            for (int i = 0; i<bytesRead>> 1; i++) {
                inference.buffers[inference.buf_select][inference.buf_count++] = sampleBuffer[i];
    
                if (inference.buf_count >= inference.n_samples) {
                    inference.buf_select ^= 1;
                    inference.buf_count = 0;
                    inference.buf_ready = 1;
                    break;
                }
            }
        }
    } else if (EI_CLASSIFIER_FREQUENCY == 8000) {
        // read into the sample buffer
        int bytesRead = PDM.read((char *)&sampleBuffer[0], bytesAvailable);
    
        if (record_ready == true) {
            for (int i = 0; i<bytesRead>> 1; i = i + 2) {
              
                inference.buffers[inference.buf_select][inference.buf_count++] = (sampleBuffer[i]/2) + (sampleBuffer[i+1]/2);
    
                if (inference.buf_count >= inference.n_samples) {
                    inference.buf_select ^= 1;
                    inference.buf_count = 0;
                    inference.buf_ready = 1;
                    break;
                }
            }
        }      
    } else {
      ei_printf(
            "ERROR - Classifier frequency not supported\n");
    }
}

I also hard coded 16khz for the mic:

if (!PDM.begin(1, 16000)) {

I also added a break when the mic samples are copied into the inference buffer so that it doesn’t write outside the buffer.

1 Like