Edge Impulse C++ Deployment: Ensuring Correct INT8 Input/Output Conversion

LayanH · March 14, 2025, 7:13pm

I am deploying a quantized int8 byom on edge impulse. After generating the C++ library and deploying it on Raspberry Pi 5, I followed the YouTube tutorial on creating a main.cpp and Makefile. However, when inferencing, the output value doesn’t match the expected value. I think it has something to do with quantizing my input buffer values and de-quantizing my output.

Do I need to manually quantize my input to feed the model (in the get_signal_data function)? I called the ei_run_classifier.h file in my main.cpp, and I found the tflite_helper.h file which I think quantizes the input but I do not know if its implicitly called or do I have to call it in my main.cpp file?

Essentially, with quantized model deployment, how do I ensure that the model is being fed the correctly converted int8 input, which is originally float32, and how do I convert the int8 output to float32 for readability? Do the libraries handle it? If so, which files do I need to call on my main.cpp file?

Any insight is appreciated

LayanH · March 14, 2025, 7:30pm

I used Netron to acquire the scales and zero-point values of my model and used them to manually quantize my float32 input and then de-quantize my model’s output. But the output does not make sense. I generated a Python code on Jupyter for inference just to test the same model and verify its predictions. It is similar to the code I saw in a previous post, but when inferencing, the output differs a lot compared to the Python version of the code.