CMSIS-NN making inference slower

I was making some tests to learn about the behaviour of the NN when using CMSIS-DSP and CMSIS-NN with different projects and I’ve noticed that when defining EI_CLASSIFIER_TFLITE_ENABLE_CMSIS_NN 0 all the inferences in int8 are much slower but when used on dense layers the inference time drops. On a NN with 3 dense layers of 512 neurons each time results were 598ms with CMSIS-NN and 516ms without it (I’d like to add that in those cases I just wanted to have a large NN to appreciate the inference time I was not looking for good results even it ended giving good results and also that the CPU being used is a CortexM4).
Is there any reason for that? Why in Conv1D, Conv2D and both MobileNetV1 and MobileNetV2 inferences are accelerated when using CMSIS-NN but slowed for fully connected layers.
Project ID:

Context/Use case: