Quantization option for BYOM

Question/Issue:
Is there something I’m missing about quantization option for BYOM?

Project ID:
721531

Context/Use case:
Applying quantization to BYOM

Summary:
I’ve successfully loaded my model (BYOM) in .tflite format and generated the C++ library for use on my MCU. I’ve also confirmed that inference is working by evaluating the results.
Now I’ve tried to use quantization. I’ve loaded ONNX model and provided representative dataset, but I am getting this warnings/errors:

Is it possible that quantization will produce such a memory heavy model or am I doing something wrong?

Steps to Reproduce:

  1. Load BYOM in SavedModel format
  2. Check quantization optimization option

Expected Results:
I’ve expected to get quantized model that fits onto MCU (ESP32-S3-N8R8)

Actual Results:
Warnings/error saying that quantized model won’t fit onto MCU

Reproducibility:

  • [x] Always
  • [ ] Sometimes
  • [ ] Rarely

Environment:

  • Platform: ESP32-S3-N8R8
  • Build Environment Details: IDF-ESP
  • OS Version: Windows 11
  • Edge Impulse Version (Firmware): 1.72.4
  • Edge Impulse CLI Version: Not used
  • Project Version: 1.0.0
  • Custom Blocks / Impulse Configuration: None

Hi @Sreten_Avisto

There can be limitations on quantisation support when using BYOM, as we can only quantise when the ops and activation functions are in Tflite, can you share the ONNX model architecture as a screenshot or do you mind if I pull it from your project to check it out?

You wont see Quantization if:

  • Your model type isn’t supported by int8 or full-integer quantization (some custom layers aren’t quantizable).
  • Input data type or preprocessing block prevents quantization.

See limitations for more

Best

Eoin

Hi @Eoin

Sure, you can take a look.

Quantization option for BYOM is only supported for ONNX and SavedModel formats, that’s why we converted our model to ONNX.

Regarding listed limitations, my input shape is a batch of size 1 (1, 50, 25) and output shape is regression (1, 1).