Float32 vs int8

dannydeng · August 22, 2025, 2:15am

Hi

I’d like to know the differences between the float32 and int8 versions of a model that can be deployed. Besides the four options (Latency\Ram\Flash\Accuracy) displayed on the deployment interface, are there any other differences for edge devices?
For example, when deploying with float32, will a Cortex-M4F core be more efficient than a Cortex-M4 core?
Also, does running the float32 version of a model on an edge device involve the FPU?
Will the float32 version be less efficient on devices without an FPU?

I only found the following description in the documentation:

Float32 vs int8 models
You can choose to test your model using either the float32 or int8 quantized version. The float32 version offers higher precision but may use more resources, while the int8 quantized version is optimized for memory and computational efficiency, making it suitable for edge devices with limited resources.

If you have any ideas please let me know, thanks！

Arjan · August 25, 2025, 4:36pm

Hi @dannydeng,

To answer your questions:

Running a float32 model on a FPU (CM-M4F) will be faster than running on CPU (CM-M4)
you can test this, by enabling / disabling the FPU. Here is a list of options which allows you to do so using the arm gcc compiler
When applicable, the FPU is enabled. The FPU will not only be used for running the model but also in the pre-processing stage.