Edge Impulse inference much slower than evaluated on Arduino Nano 33 BLE

Question/Issue:
Significantly increased latency on Arduino Nano 33 BLE when running audio classification.

Project ID:
717452

Context/Use case:
I am currently working on a real-time audio classification model on the Arduino Nano 33 BLE. The model uses 16 kHz audio input and a 1-second window. Inference is performed using the Arduino library deployment. On the device, the DSP block runtime (result.timing.dsp) is approximately 3479 ms, and classification takes about 324 ms (result.timing.classification).
However, Edge Impulse Studio estimates DSP latency at around 488 ms and classification latency at 225 ms (using Arduino Nano 33 BLE Sense as the target).
In previous projects, the actual runtime closely matched these estimates, with total inference time under 1 second.

Summary:
Unexpectedly high DSP latency (~3.5 seconds) and increased classification time on the Arduino Nano 33 BLE, compared to the estimated values for a similar board (Arduino Nano 33 BLE Sense). Since models in earlier projects did not show this behavior, the issue may be related to a newer version of the Edge Impulse SDK.

Steps to Reproduce:

  1. In Arduino IDE, include the Arduino library as a .ZIP file (Sketch > Include Library > Add .ZIP library).
  2. Create a sketch from the static buffer example (File > Examples > Project name - Edge Impulse > static_buffer > static_buffer).
  3. In the sketch, paste raw features (any 16000 float values) into the static const float features[] definition.
  4. Select the connected board in Arduino IDE (Tools > Port > Arduino board).
  5. Upload the sketch.
  6. Open the Serial Monitor in Arduino IDE to view the output.

Expected Results:
Total inference time of about 713 ms (including approximately 488 ms for DSP processing and 225 ms for classification).

Actual Results:
Total inference time of about 3802 ms (including approximately 3478 ms for DSP processing and 324 ms for classification).

Reproducibility:

  • Always

Environment:

  • Platform: Arduino Nano 33 BLE (nRF52840)
  • Build Environment Details: Arduino IDE 2.3.6
  • OS Version: Windows 10
  • Edge Impulse Version (Firmware): 1.72.14
  • Project Version: 1.0.6
  • Custom Blocks / Impulse Configuration:
    • MFE block
      • Frame length: 0.02 s
      • Frame stride: 0.01 s
      • Filter number: 40
      • FFT length: 512
      • Low frequency: 125 Hz
      • High frequency: 7500 Hz
      • Noise floor: -80 dB
    • Neural network classifier:
      • Keras model
      • Quantized (int8)

Logs/Attachments:

Edge Impulse standalone inferencing (Arduino)
run_classifier returned: 0
Timing: DSP 3478 ms, inference 324 ms, anomaly 0 ms

Just gotta check this… what optimization setting are you compiling with?

No optimizations are used. Just standard TensorFlow Lite deployment.

Looks like -Os is default but it’s worth checking.

Can you
Enable verbose compilation output Go to File > Preferences.

  • Check the box next to Show verbose output during: compile.

Recompile and send us the log.

Also, you can try going into platform.txt and change the lines for optimization level. Find the line that says -Os and change to -O2

Thanks. Compilation logs are available here.
I also checked platform.txt at C:\Users\User\AppData\Local\Arduino15\packages\arduino\hardware\mbed_nano\4.3.1, but couldn’t find any line containing “-Os” or “-O2”.

Has this been resolved? I can confirm that when building for the same processor (but different board) I’m seeing about a 6x increase in DSP time going from EI studio version 1.73.0 to 1.74.31. This is with the exact same project, impulse, and impulse settings, no retraining or other changes. The only difference is the deployment with the slower DSP time was built and downloaded almost 3 months after the other.

Found a work-around for this. Basically if you’re using a processor that supports CMSIS DSP (like the nrf52 family), add the following before any Edge Impulse library includes:

#define EIDSP_USE_CMSIS_DSP 1

I believe the reason this is needed now and wasn’t before is due to the addition of the dsp/numpy_types.h include in the classifier/ei_classifier_types.h file in the edge-impulse-sdk. This inclusion causes porting/ei_classifier_porting.h to be included before dsp/config.hpp, resulting in EI_PORTING_STM32_CUBEAI being defined and EIDSP_USE_CMSIS_DSP being set to 0 in dsp/config.hpp. I think ultimately the logic for setting EIDSP_USE_CMSIS_DSP in dsp/config.hpp needs to be fixed, but I’ll leave that up to the EI folks.

Hi @Mrbigmatt

we are close to merge a fix for the EI_PORTING_STM32_CUBEAI issue, but thank you for reporting the solution!

fv