CMSIS-NN making inference slower

I was making some tests to learn about the behaviour of the NN when using CMSIS-DSP and CMSIS-NN with different projects and I’ve noticed that when defining EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN 0 all the inferences in int8 are much slower but when used on dense layers the inference time drops. On a NN with 3 dense layers of 512 neurons each time results were 598ms with CMSIS-NN and 516ms without it (I’d like to add that in those cases I just wanted to have a large NN to appreciate the inference time I was not looking for good results even it ended giving good results and also that the CPU being used is a CortexM4).
Is there any reason for that? Why in Conv1D, Conv2D and both MobileNetV1 and MobileNetV2 inferences are accelerated when using CMSIS-NN but slowed for fully connected layers.
Project ID:

Context/Use case:

Hi @JosuGaztelu,

Unfortunately, we do not have much control over how TFLite Micro is making exact decisions on the individual layer level. Your best bet would likely be to pose your question to the TFLite Micro repo (GitHub - tensorflow/tflite-micro: TensorFlow Lite for Microcontrollers) or the Arm AI/ML Discussion Forum (Arm Community).

1 Like

They answered on the tfkite forum and I was told that maybe it has something to do with the fact that the CMSIS-NN version is old, I’m trying with the updated one but I’m having some issues. Maybe that’s the reason for slowing the inference, I’ll try to test it.
Here is the link:
CMSIS-NN making tflite inference slower for fully connected layers · Issue #1447 · tensorflow/tflite-micro (

1 Like