@janjongboom @dansitu I have trained a model. For the unoptimized Float32 model the accuracy is 94.9% but it is 13.6% for the Quantized Int8. I guess accuracy should not have this much difference. Please see my project EE2 for more details and let me know if I am doing something wrong.
I’m not sure how well quantization performs with batch normalization, @dansitu will probably have more insights.
The post-training quantization process introduces some error, and some models are more affected by this than others. This can be especially true in deep models like the one you’ve defined. You can potentially improve resiliency by adding more regularization—perhaps try adding some dropout between layers?
One thing worth considering is that your model is estimated to be quite efficient even as float32—the latency is low, and the memory usage is not that high. There’s no problem with using float32 if it fits the limitations of your deployment target.
In the future we’ll be moving towards quantization-aware training, which should improve accuracy drop in cases like this.