RAW feature format for quantized model

I apologize if this has been asked before, but I’m having trouble finding it (probably just searching the wrong terms). When I use the unoptimized (float32) version of our model (via CMSIS pack for STM32) and supply floats for the features as in the tutorial here, about 2/3 of the way down the page when the features are pasted into a static buffer, everything works fine. However, if I use the quantized (int8) version of the model, and do the same but with ints instead of floats, I get a DSP error. My guess is that this is because the features are int (int32_t) in my buffer and it’s looking for int8_t data, but the RAW features from the Live Classification are outside the range of int8_t

So, what should the feature data look like to feed the quantized model? Again, sorry if this is a silly question, but I’m quite new to this.

Okay, so this was a dumb question I think. I didn’t notice that there are methods to convert data to int8. But now another dumb question: I get my original data from a sensor in int32. Do I have to cast it to float32, then use the int8 function? It feels like it would be faster in our case just to use the unoptimized model since we don’t need very fast classification rates. Am I even starting to be on the right track?

One more time, sorry for the idiocy of my questions - I’m still working out pretty basic examples to get a feel for things.

Hello Jacob,

No question is dumb here :slight_smile:
Quantization works by reducing the precision of the model’s weights, not the input values you provide to the model.

So you can actually keep your input values as float :wink:



1 Like

Just a side-note that on some models (e.g. images) we can quantize directly (this is done directly, no need for the user to look at this) from a signal (e.g. mapping to a framebuffer). This is done to save RAM, see the run_classifier_image_quantized functions if people are interested.