Need help with porting (STM32 Nucleo)

I’m working on making an audio classification (keyword spotting) system on an STM32 Nucleo-L476RG board. I’ve trained a classifier on the pre-made yes/no keyword dataset and downloaded the C++ library.

What I’d like to do is use the library in STM32CubeIDE without Arduino or mbed (I plan to use I2S and DMA with a double buffer to read in audio). I’ve read through the porting guide, and from what I gathered, I need to create a nucleo-l476rg directory in edge-impulse-sdk/porting with debug_log.cpp and ei_classifier_porting.cpp files. These files should contain the functions present in the other board .cpp files. The functions should define things like ei_printf() (using the Nucleo’s UART port) and ei_read_timer_ms() (i.e. reading from a timer that ticks once per millisecond).

Does this sound like I’m on the right path for porting?

If so, here’s my next question: once I’ve defined the “porting” functions for my particular board, how do I select those particular debug_log.cpp and ei_classifier_porting.cpp files to be included in the build process (and not compile the other board files)? I know it’s probably something simple, and I’m just missing it.

@ShawnHymel there already is an stm32-cubeai folder which uses the STM32HAL libraries (despite the name it’s only some utility functions around timing and printing in this folder for stm32), so that should be fine. Just set up the UART as described here: https://docs.edgeimpulse.com/docs/using-cubeai#configuring-printf. You can either exclude all the other folders in the porting layer, or just delete them (not sure how CubeIDE is doing that).

That should be it. C++ files should be automatically be picked up by the compiler.

@janjongboom Awesome, thank you! I removed all but the stm32-cubeai folder and that seems to help. However, I’m now running into an issue where the compiler does not like some of the assembly calls in the CMSIS folder (inside the downloaded library).

Here is one such error: error: impossible constraint in 'asm'

I thought that TFLite could be used without the CMSIS-NN library. I created the project with CubeIDE (so, CubeMX), which imports some CMSIS functions. Is there something I need to do to enable the CMSIS-NN framework, can I delete the CMSIS folder in the EI downloaded library, or did I miss something entirely with getting this to compile?

@ShawnHymel very interesting - I have never seen that error. Will have a test later this week. You could disable / remove the NN folder, and then setting this macro to 0:

That should build without CMSIS-NN.

@ShawnHymel my guess is the GCC7 version that ST ships with their IDE is an issue, perhaps GCC9 works better? But naturally there is no way to change that nor to just generate a @&* Makefile :slight_smile:

Anyway I’ve managed to compile by:

  • Create new C++ library in STM32CubeIDE (tested on the DISCO-L475VG)
  • Enable CRC and printf on the target (see here).
  • Create new SOURCE FOLDER called ‘gestures’ (It’s different from a normal folder)
  • Add the three folders from the Edge Impulse C++ export to the ‘gestures’ folder.
  • Delete all non-stm32 folders in edge-impulse-sdk/porting
  • Delete edge-impulse-sdk/utensor
  • Delete ei_run_classifier_c.h and ei_run_classifier_c.cpp
  • Add include paths (GNU C++ and GNU C):
    • ${workspace_loc:/${ProjName}/gestures}/
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/CMSIS/DSP/Include/
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/CMSIS/DSP/PrivateInclude/
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/CMSIS/NN/Include/
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/tensorflow
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/flatbuffers
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/flatbuffers/include
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/flatbuffers/include/flatbuffers
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/gemmlowp/
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/gemmlowp/fixedpoint
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/gemmlowp/internal
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/ruy
    • ${workspace_loc:/${ProjName}/gestures}/model-parameters
    • ${workspace_loc:/${ProjName}/gestures}/tflite-model
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/anomaly
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/classifier
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/dsp
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/dsp/kissfft
    • ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/porting
  • Set -DEI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN=0 as a flag.
  • Delete the CMSIS/Core and CMSIS/NN folders.

Tah dah :partying_face:

1 Like

I’ve filed a bug with CMSIS5 here: https://github.com/ARM-software/CMSIS_5/issues/1008 and also emailed folks at Arm, hopefully they have an idea on what is going on here.

@janjongboom It works! Or, at least it’s now compiling :grinning: Thank you for helping out with this. I’m assuming that it’s going to be a bit slower without the CMSIS-NN calls, but it should work well enough for the demo (I hope). It looks like you are correct in that it’s a bug with the ARM gcc compiler. I found reference to it here: https://github.com/ARM-software/CMSIS_5/issues/996

I’m not familiar with ARM assembly, so the solutions/workarounds presented were a bit over my head :sweat_smile:

@ShawnHymel reading through the bug report this issue does not occur on Linux. Interesting.

Let me see if we can provide a patch earlier than CMSIS can.

@ShawnHymel, to fix this, change in arm_nn_mat_mult_nt_t_s8.c the implementation of __patched_SXTB16_RORn to:

__STATIC_FORCEINLINE uint32_t __patched_SXTB16_RORn(uint32_t op1, uint32_t rotate) {
  uint32_t result;
  if (__builtin_constant_p (rotate) && ((rotate == 8U) || (rotate == 16U) || (rotate == 24U))) {
    asm volatile ("sxtb16 %0, %1, ROR %2" : "=r" (result) : "r" (op1), "i" (rotate) );
  } else {
    result = __SXTB16(__ROR(op1, rotate)) ;
  }
  return result;
}

Verified this on the ST IoT Discovery Kit!

@ShawnHymel we’ve backported the fix to our SDK, and no regressions on our target platforms. Will be available in the next release (later today) in all new exports.

@janjongboom Thank you! I put the CMSIS/Core and CMSIS/NN folders back in, patched the code with your fix, and removed the -DEI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN=0 flag. It seems to compile and work.

It seems to be quite slow on my Nucleo board. I’m using the yes/no dataset from your tutorials, trained with the MCC -> NN blocks (keeping all defaults). In my code, I copied in a raw 16-bit sound buffer from one of the known-good samples and fed it to the classifier. It looks like DSP is taking ~350 ms and classification ~280 ms. Do those seem reasonable on an 80 MHz ARM (I’m using a Nucleo-L476RG)? This is in the release configuration (using the -DDEBUG flag doubles the classification time).

@ShawnHymel, set the macro to EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN=1 - it’s only enabled by default when we can detect the target (which we can’t on STM32IDE as they don’t set any macros on MCU family). Should go down to ~30ms for the classification part.

For DSP set EIDSP_QUANTIZE_FILTERBANK=0. Takes 10K more RAM but should save you 100ms.

Note that when switching to continuous audio mode the DSP slices are smaller so this’ll go down, can easily do 4-5 inferences a second that way.

1 Like

@janjongboom Like magic! Thank you :grinning:

1 Like

Hi i tried deploying an audio recognition example in stm32f401re using https://github.com/edgeimpulse/example-standalone-inferencing-mbed with mbedOS and then i tried with STM32CUBEIDE same example .
I noticed that the DSP times were very different .
mbedOS : DSP_TIME = 150 ms
stm32cubide : DSP_TIME = 420 ms

Also i did use the macros that you suggested in the stm32cubeide
What could cause this increase of time ?

@tiriotis

  1. Have you set EIDSP_USE_CMSIS_DSP=1 macro?
  2. Could it be that you’re not running on full clock speed?