I just tested a weird compiler flag that is set on the Portenta. (in the variants/PORTENTA_H7_M7/cflags.txt file, where you switch the default setting on line 14 from -Os to -O3) Read about it here
So my vision ML program can just fit on the regular M7 core, with this change it uses more RAM and I needed to switch to the 1.5M M7 and 0.5M M4 core setting but my classification time went from 121 ms to 101 ms a 20 ms improvement
So about a 20% speed improvement for a slightly bigger program.
Here is my data
using 0s flag using the 1.0 M7 and 1.0 M5 core split
Sketch uses 776368 bytes (98%) of program storage space. Maximum is 786432 bytes.
Global variables use 89808 bytes (17%) of dynamic memory, leaving 433816 bytes for local variables. Maximum is 523624 bytes.
dfu-util 0.10-dev
run_classifier returned: 0
Predictions (DSP: 1 ms., Classification: 121 ms., Anomaly: 0 ms.):
[0.94531, 0.05078, 0.00391, 0.00000]
using O3 flag using 1.5 M7 and 0.5 M4 core split
Sketch uses 806184 bytes (55%) of program storage space. Maximum is 1441792 bytes.
Global variables use 89808 bytes (17%) of dynamic memory, leaving 433816 bytes for local variables. Maximum is 523624 bytes.
dfu-util 0.10-dev
run_classifier returned: 0
Predictions (DSP: 1 ms., Classification: 101 ms., Anomaly: 0 ms.):
[0.99609, 0.00000, 0.00000, 0.00000]