So, we’d like to have our cake and eat it too! Like Jan pointed out, CMSIS provides some impressive performance enhancements via usage of DSP hardware on ARM chips. Unfortunately, the latest CMSIS library opts for the fastest possible speed at the expense of ROM (more detail than is probably interesting here, but they’re using a mixed radix FFT)
However, they have a prior version that, with some patching, is almost as fast (and still uses HW acceleration), BUT, has the added benefit of very little ROM cost. (TMI: radix 2 FFT with one table for all FFT sizes)
We’re working on this patch and once it’s released, the EIDSP_USE_CMSIS_DSP flag will cost far less ROM.
The interesting part is that we don’t see this happening on other targets, which is why I’d be very interested in seeing your full project. E.g. on a STM32L4 target I see ~10K for CMSIS-DSP with the latest SDK.
I went through this same painful process of finding out that CMSIS-DSP needs to be configured manually to only include the data/functionality that is needed. For my particular case (M4 platform) I found the culprit to be the FFT tables - turns out the entry function into the 32-bit float FFT has a case statement to switch through all sizes of FFT; the compiler sees this and decides it needs to include all of the FFT tables, which for me added something like 120kB overhead.
I’m very happy to hear that these flags will be added automatically in the future!
@tennies The super weird thing is that it seems to be linker dependent. On some targets the increase is 10K for CMSIS-DSP flash usage, and then on another target it doubles the flash usage - weird, but yes, should be fixed soon!
@janjongboom the great feature of this (or next) update would be to export all macros that must be defined to a dedicated text file.
For example, all macros needed for run_inference() function.
Why do I ask for it? Because I do not use the deployed pack in Stm32CubeIDE but in let’s say bare IDE environment and the list of all needed macros would make the implementation much easier.
@hwidvorakinfo In general everything is already included in model_metadata.h and dsp/config.hpp - no need to set anything else unless we can’t autodetect your MCU and you want to enable HW acceleration through CMSIS / ARC DSPs.
FWIW, these are the flags I’ve found I needed using the 32-bit float FFTs
General flags needed: ARM_DSP_CONFIG_TABLES;ARM_FAST_ALLOW_TABLES;ARM_ALL_FAST_TABLES;ARM_FFT_ALLOW_TABLES
(ARM_ALL_FAST_TABLES catches sin cos and the like without increasing code size substantially in my experience):
This one is needed for all 32-bit float RFFTs: ARM_TABLE_REALCOEF_F32
And the below macros must be defined for every FFT size you are using (where <NFFT> is the RFFT size) ARM_TABLE_TWIDDLECOEF_F32_<NFFT/2>;ARM_TABLE_BITREVIDX_FLT_<NFFT/2>;ARM_TABLE_TWIDDLECOEF_F32_<NFFT/2>;ARM_TABLE_TWIDDLECOEF_RFFT_F32_<NFFT>
(Note MFCC features use 2 FFT sizes, one for the FFT and one for the DCT)
In addition, you need to exclude the following files from compilation or compiler will complain about undefined variables:
I’m not sure if those are the exact same macros that we’re looking at, but they look correct. Even if you try to use an FFT size that you don’t have the table for, you’ll get a runtime error (or sometimes it won’t even build).
Also, I don’t remember having to remove those files, so that’s different too. @tennies are you using the older arm_cfft_f32.c ? My understanding was that arm_cfft_radix4*.c arm_cfft_radix2*.c are called under-the-hood by arm_rfft_fast* (which is the function that uses the fast tables)
But in the end, it’s safe to play with those macros to try to reduce ROM size, b/c it will fail in a very obvious way if you’re missing a key table.
What do you mean by the older arm_cfft_f32.c?
I found that only arm_cfft_radix8 was used by the rfft function, so the others could be excluded. Including them meant having to include a few other tables. In the end, enabling CMSIS-DSP with these options only seemed to add about 6-7kB to the flash size compared to CMSIS-DSP disabled (EIDSP_USE_CMSIS_DSP I think is the flag).
Can I ask you what is the status of the CMSIS-PACK built with only selected tables to be linked in or in other words the “enhanced CMSIS-PACK” version?
@hwidvorakinfo Still in progress, the PR is ready but we found a bug with some audio models that needed to be ironed out (and then other things got in the way, you know how it goes ) I’ll update this thread when released.