Int8 model: representative dataset

Question:

Today post-quantisation is performed, and full integer quantisation for the int8 model. Correct?

For full integer quantisation, you need a representative dataset. How is this dataset created in the back? Is it the complete valdation set?

By looking into the code ( Exporting block to edit locally…), more specifically in conversion.py theres is representative_dataset_generator(validation_dataset). It is correct that this takes to complete the validation dataset.

I ask this because I had recently a drop in accuracy for the int8 after changing the size of the validation set. My hypothesis is: that the size of the new validation was no longer sufficient for the representative dataset.

Regards,
Joeri

Hi @Joeri,

Yes, post-training quantization is performed to produce a full int8 quantized model. We use the validation set during training as our representative dataset. So, your validation set (randomly chosen from the training set at the start of training) should be big enough to be representative of your dataset.