Hi, I have been running into issues today with deployment and the run time of the classifier on STM32 hardware. Yesterday I created a CMSIS pack as I have done previously and the execution time was around 150ms which is near to the EI estimate based on my STM32 hardware. Today I have been trying different models and the execution time was up at 2.1s which was very strange as I hadn’t changed much. In a last check I went back to my project from yesterday and retrained it without changing any parameters and produced a new CMSIS pack and the execution time is again over 2 seconds. If I install my CMSIS pack from yesterday it is fine at 150ms. I cant make sense why an identical setup for a model only retrained would have a radically different execution time to run the classifier. The timing seems to correlate with issues posted yesterday but I have since been told it is unrelated. Thanks
Hi @glowes1985
I’ll check, thank you for reporting.
Can you share the project id ?
Which project do you use to measure latencies ? Is it one of our example standalone ?
Can you tell me the EI-SDK version from the actual pack and the previous one from 2 days ago ?
regards,
fv
Hi,
The project id is #671842.
My project is an audio classifier using MFE using Cube.MX CMSIS-PACK deployment. I am measuring the computation time on the STM32 device using a oscilloscope. Where would I find the EI-SDK version number within the .pack file when integrated in STM Cube IDE? I checked in model_metadata.h for each pack and the only difference I can see is the working pack is EI_STUDIO_VERSION_PATCH 0 and the one with extended computation time is EI_STUDIO_VERSION_PATCH 3 with major and minor set at 1 and 74 for both packs. Is this what you needed?
Thanks
Hi @glowes1985
yeah these are the versions, thank you.
So you are using the Cube.MX CMSIS-PACK, not the Open CMSIS-pack - similar naming but slightly different content.
I saw problem with Cube IDE in the past when using DSP functions, can you tell me the Compiler you are using and the version?
In Cube IDE you can find the info in Settings->STM32Cube->Toolchain manager
thank you!
regards,
fv
Hi,
Yes I’m using the Cube.MX CMSIS-PACK deployment. I am using GNU Tools for STM32 (11.3.rel1) with STM32CubeIDE 1.13.2. My code remains the same but just changing between the two packs has radically different run time results.
Thanks
I have tested your project with a Nucleo F4463RE (cortex-m4 running 84 MHz at with DSP) board here the timings using Cube IDE and compiling with STM GCC13.3
pretty similar to yours, I think your MCU is similar.
If instead I use the Open CMSIS-pack (here the project GitHub - edgeimpulse/example-standalone-inferencing-stm32f4-csolution: Example Standalone for STM32F4 Nucleo boards using CMSIS toolbox) compiling with GCC 13.3 here the timings:
Edge Impulse standalone inferencing (STM32)
Predictions (DSP: 857 ms., Classification: 47 ms., Anomaly: 0ms.):
#Classification results:
Noise: 0.99609
Whistles: 0.0039
For sure better, but still far from the prediction and what you observed.
Can you upload here the “fast” version of the model ?
thx!
fv
Hi,
Running the fast pack has the following results in Release mode.
Predictions (DSP: 131 ms., Classification: 15 ms., Anomaly: 0 ms.):
Predictions:
Noise: 0.92969
Whistles: 0.07031
I can send you the faster pack via email if that helps?
Thanks
Just for completeness, the newer pack for the same model with same code:
Predictions (DSP: 2248 ms., Classification: 15 ms., Anomaly: 0 ms.):
Predictions:
Noise: 0.89453
Whistles: 0.10547
Thanks
Hi @glowes1985
the issue is CMSIS DSP code not used - I still need to understand why for the older version it is and for newer no.
For a temporary fix:
in config.hpp change
#if (defined(MBED) || __ARM_ARCH_PROFILE == ‘M’ || defined(__TARGET_CPU_CORTEX_M0) || defined(__TARGET_CPU_CORTEX_M0PLUS) || defined(__TARGET_CPU_CORTEX_M3) || defined(__TARGET_CPU_CORTEX_M4) || defined(__TARGET_CPU_CORTEX_M7) || defined(USE_HAL_DRIVER) || defined(ARDUINO_NRF52_ADAFRUIT)) && !defined(EI_PORTING_STM32_CUBEAI)
to
#if (defined(MBED) || __ARM_ARCH_PROFILE == ‘M’ || defined(__TARGET_CPU_CORTEX_M0) || defined(__TARGET_CPU_CORTEX_M0PLUS) || defined(__TARGET_CPU_CORTEX_M3) || defined(__TARGET_CPU_CORTEX_M4) || defined(__TARGET_CPU_CORTEX_M7) || defined(USE_HAL_DRIVER) || defined(ARDUINO_NRF52_ADAFRUIT))
basically, remove the last && !defined(EI_PORTING_STM32_CUBEAI) .
You can check the value of EIDSP_USE_CMSIS_DSP define, if 1 CMSIS DSP code is used, which speeds up DSP calculation.
thank you
fv