Need help with porting (STM32 Nucleo)

ShawnHymel · September 30, 2020, 1:11am

I’m working on making an audio classification (keyword spotting) system on an STM32 Nucleo-L476RG board. I’ve trained a classifier on the pre-made yes/no keyword dataset and downloaded the C++ library.

What I’d like to do is use the library in STM32CubeIDE without Arduino or mbed (I plan to use I2S and DMA with a double buffer to read in audio). I’ve read through the porting guide, and from what I gathered, I need to create a nucleo-l476rg directory in edge-impulse-sdk/porting with debug_log.cpp and ei_classifier_porting.cpp files. These files should contain the functions present in the other board .cpp files. The functions should define things like ei_printf() (using the Nucleo’s UART port) and ei_read_timer_ms() (i.e. reading from a timer that ticks once per millisecond).

Does this sound like I’m on the right path for porting?

If so, here’s my next question: once I’ve defined the “porting” functions for my particular board, how do I select those particular debug_log.cpp and ei_classifier_porting.cpp files to be included in the build process (and not compile the other board files)? I know it’s probably something simple, and I’m just missing it.

janjongboom · September 30, 2020, 6:14am

@ShawnHymel there already is an stm32-cubeai folder which uses the STM32HAL libraries (despite the name it’s only some utility functions around timing and printing in this folder for stm32), so that should be fine. Just set up the UART as described here: https://docs.edgeimpulse.com/docs/using-cubeai#configuring-printf. You can either exclude all the other folders in the porting layer, or just delete them (not sure how CubeIDE is doing that).

That should be it. C++ files should be automatically be picked up by the compiler.

ShawnHymel · September 30, 2020, 6:36pm

@janjongboom Awesome, thank you! I removed all but the stm32-cubeai folder and that seems to help. However, I’m now running into an issue where the compiler does not like some of the assembly calls in the CMSIS folder (inside the downloaded library).

Here is one such error: error: impossible constraint in 'asm'

I thought that TFLite could be used without the CMSIS-NN library. I created the project with CubeIDE (so, CubeMX), which imports some CMSIS functions. Is there something I need to do to enable the CMSIS-NN framework, can I delete the CMSIS folder in the EI downloaded library, or did I miss something entirely with getting this to compile?

janjongboom · September 30, 2020, 6:52pm

@ShawnHymel very interesting - I have never seen that error. Will have a test later this week. You could disable / remove the NN folder, and then setting this macro to 0:

github.com

edgeimpulse/inferencing-sdk-cpp/blob/master/classifier/ei_classifier_config.h#L26


* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/


#ifndef _EI_CLASSIFIER_CONFIG_H_
#define _EI_CLASSIFIER_CONFIG_H_


#ifndef EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN
#if defined(__MBED__)
   #include "mbed.h"
   #if (MBED_VERSION < MBED_ENCODE_VERSION(5, 7, 0))
       #define EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN      0
   #else
       #define EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN      1
   #endif // Mbed OS 5.7 version check
#elif defined(__TARGET_CPU_CORTEX_M0) || defined(__TARGET_CPU_CORTEX_M0PLUS) || defined(__TARGET_CPU_CORTEX_M3) || defined(__TARGET_CPU_CORTEX_M4) || defined(__TARGET_CPU_CORTEX_M7)
   #define EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN      1
#else

That should build without CMSIS-NN.

janjongboom · September 30, 2020, 8:41pm

@ShawnHymel my guess is the GCC7 version that ST ships with their IDE is an issue, perhaps GCC9 works better? But naturally there is no way to change that nor to just generate a @&* Makefile

Anyway I’ve managed to compile by:

Create new C++ library in STM32CubeIDE (tested on the DISCO-L475VG)
Enable CRC and printf on the target (see here).
Create new SOURCE FOLDER called ‘gestures’ (It’s different from a normal folder)
Add the three folders from the Edge Impulse C++ export to the ‘gestures’ folder.
Delete all non-stm32 folders in edge-impulse-sdk/porting
Delete edge-impulse-sdk/utensor
Delete ei_run_classifier_c.h and ei_run_classifier_c.cpp
Add include paths (GNU C++ and GNU C):
- ${workspace_loc:/${ProjName}/gestures}/
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/CMSIS/DSP/Include/
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/CMSIS/DSP/PrivateInclude/
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/CMSIS/NN/Include/
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/tensorflow
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/flatbuffers
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/flatbuffers/include
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/flatbuffers/include/flatbuffers
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/gemmlowp/
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/gemmlowp/fixedpoint
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/gemmlowp/internal
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/third_party/ruy
- ${workspace_loc:/${ProjName}/gestures}/model-parameters
- ${workspace_loc:/${ProjName}/gestures}/tflite-model
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/anomaly
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/classifier
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/dsp
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/dsp/kissfft
- ${workspace_loc:/${ProjName}/gestures}/edge-impulse-sdk/porting
Set -DEI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN=0 as a flag.
Delete the CMSIS/Core and CMSIS/NN folders.

Tah dah

janjongboom · September 30, 2020, 9:01pm

I’ve filed a bug with CMSIS5 here: https://github.com/ARM-software/CMSIS_5/issues/1008 and also emailed folks at Arm, hopefully they have an idea on what is going on here.

ShawnHymel · September 30, 2020, 11:04pm

@janjongboom It works! Or, at least it’s now compiling Thank you for helping out with this. I’m assuming that it’s going to be a bit slower without the CMSIS-NN calls, but it should work well enough for the demo (I hope). It looks like you are correct in that it’s a bug with the ARM gcc compiler. I found reference to it here: https://github.com/ARM-software/CMSIS_5/issues/996

I’m not familiar with ARM assembly, so the solutions/workarounds presented were a bit over my head

janjongboom · October 1, 2020, 8:02am

@ShawnHymel reading through the bug report this issue does not occur on Linux. Interesting.

Let me see if we can provide a patch earlier than CMSIS can.

janjongboom · October 1, 2020, 10:12am

@ShawnHymel, to fix this, change in arm_nn_mat_mult_nt_t_s8.c the implementation of __patched_SXTB16_RORn to:

__STATIC_FORCEINLINE uint32_t __patched_SXTB16_RORn(uint32_t op1, uint32_t rotate) {
  uint32_t result;
  if (__builtin_constant_p (rotate) && ((rotate == 8U) || (rotate == 16U) || (rotate == 24U))) {
    asm volatile ("sxtb16 %0, %1, ROR %2" : "=r" (result) : "r" (op1), "i" (rotate) );
  } else {
    result = __SXTB16(__ROR(op1, rotate)) ;
  }
  return result;
}

Verified this on the ST IoT Discovery Kit!

janjongboom · October 1, 2020, 3:45pm

@ShawnHymel we’ve backported the fix to our SDK, and no regressions on our target platforms. Will be available in the next release (later today) in all new exports.

ShawnHymel · October 2, 2020, 8:57pm

@janjongboom Thank you! I put the CMSIS/Core and CMSIS/NN folders back in, patched the code with your fix, and removed the -DEI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN=0 flag. It seems to compile and work.

It seems to be quite slow on my Nucleo board. I’m using the yes/no dataset from your tutorials, trained with the MCC -> NN blocks (keeping all defaults). In my code, I copied in a raw 16-bit sound buffer from one of the known-good samples and fed it to the classifier. It looks like DSP is taking ~350 ms and classification ~280 ms. Do those seem reasonable on an 80 MHz ARM (I’m using a Nucleo-L476RG)? This is in the release configuration (using the -DDEBUG flag doubles the classification time).

janjongboom · October 2, 2020, 9:47pm

@ShawnHymel, set the macro to EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN=1 - it’s only enabled by default when we can detect the target (which we can’t on STM32IDE as they don’t set any macros on MCU family). Should go down to ~30ms for the classification part.

For DSP set EIDSP_QUANTIZE_FILTERBANK=0. Takes 10K more RAM but should save you 100ms.

Note that when switching to continuous audio mode the DSP slices are smaller so this’ll go down, can easily do 4-5 inferences a second that way.

ShawnHymel · October 2, 2020, 10:52pm

@janjongboom Like magic! Thank you

tiriotis · January 27, 2021, 10:28am

Hi i tried deploying an audio recognition example in stm32f401re using https://github.com/edgeimpulse/example-standalone-inferencing-mbed with mbedOS and then i tried with STM32CUBEIDE same example .
I noticed that the DSP times were very different .
mbedOS : DSP_TIME = 150 ms
stm32cubide : DSP_TIME = 420 ms

Also i did use the macros that you suggested in the stm32cubeide
What could cause this increase of time ?

janjongboom · January 27, 2021, 10:39am

@tiriotis

Have you set EIDSP_USE_CMSIS_DSP=1 macro?
Could it be that you’re not running on full clock speed?