Quantization option for BYOM

Sreten_Avisto · June 30, 2025, 1:23pm

Question/Issue:
Is there something I’m missing about quantization option for BYOM?

Project ID:
721531

Context/Use case:
Applying quantization to BYOM

Summary:
I’ve successfully loaded my model (BYOM) in .tflite format and generated the C++ library for use on my MCU. I’ve also confirmed that inference is working by evaluating the results.
Now I’ve tried to use quantization. I’ve loaded ONNX model and provided representative dataset, but I am getting this warnings/errors:

Is it possible that quantization will produce such a memory heavy model or am I doing something wrong?

Steps to Reproduce:

Load BYOM in SavedModel format
Check quantization optimization option

Expected Results:
I’ve expected to get quantized model that fits onto MCU (ESP32-S3-N8R8)

Actual Results:
Warnings/error saying that quantized model won’t fit onto MCU

Reproducibility:

[x] Always
[ ] Sometimes
[ ] Rarely

Environment:

Platform: ESP32-S3-N8R8
Build Environment Details: IDF-ESP
OS Version: Windows 11
Edge Impulse Version (Firmware): 1.72.4
Edge Impulse CLI Version: Not used
Project Version: 1.0.0
Custom Blocks / Impulse Configuration: None

Eoin · August 11, 2025, 10:50am

Hi @Sreten_Avisto

There can be limitations on quantisation support when using BYOM, as we can only quantise when the ops and activation functions are in Tflite, can you share the ONNX model architecture as a screenshot or do you mind if I pull it from your project to check it out?

You wont see Quantization if:

Your model type isn’t supported by int8 or full-integer quantization (some custom layers aren’t quantizable).
Input data type or preprocessing block prevents quantization.

See limitations for more

Best

Eoin

Sreten_Avisto · August 13, 2025, 10:54am

Hi @Eoin

Sure, you can take a look.

Quantization option for BYOM is only supported for ONNX and SavedModel formats, that’s why we converted our model to ONNX.

Regarding listed limitations, my input shape is a batch of size 1 (1, 50, 25) and output shape is regression (1, 1).

Eoin · August 26, 2025, 3:59pm

Hi @Sreten_Avisto

Ah right ok but then we should see what ops are failing from the logs for quantzation.

Having trouble accessing your project at the moment via our admin panel, but you should be able to see what operations are failing to quantize in the logs. I’ll try again later to access your project and share with our ML team.

Best

Eoin