Model memory - profiling API - experiment tracking (wandb)

Joeri · October 23, 2022, 5:42pm

Hello,

Currently, I am working on a regression problem. I perform hyperparameter tuning, to find the ‘best’ model taking into the given requirements. I use wandb for experiment tracking.

To estimate my memory, I perform the following steps:

create a TFLite model: float & post-training integer quantization (8-bit fixed-point, weights only) (Dynamic range quantization)
save the model: float and 8-bit fixed-point
calculate the saved model size

The code:

#  create TFLite model - float & post-training integer quantization (8-bit fixed-point)
float_converter = tf.lite.TFLiteConverter.from_keras_model(model)
float_tflite_model = float_converter.convert()

quant_converter = tf.lite.TFLiteConverter.from_keras_model(model)
quant_converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = quant_converter.convert()
    
# save tflite models
float_file = 'float_file.tflite'
quant_file = 'quant_file.tflite'

with open("artifacts/"+float_file, 'wb') as f:
     f.write(float_tflite_model)
with open("artifacts/"+quant_file, 'wb') as f:
     f.write(tflite_quant_model)     

# model size                     
float_model_size = round(os.path.getsize("artifacts/"+float_file) / float(1024*1024), 1)
quant_model_size = round(os.path.getsize("artifacts/"+quant_file) / float(1024*1024), 1)

Question: “Is this approach correct to obtain a good first approximation?”

The idea is to have an estimation. Of course, this will not give a memory consumption estimation for a specific device. Therefore, I like to expand the experiment tracking code (running on my local computer) using the profiling API. If I understand the profiling API documentation, you need to provide a reference model closest to the model architecture that you have built.

How can I learn about the reference model architecture?

For mobilenet-ssd you can find this architecture in the literature. But it is not clear how I can find information about the architecture for, for example, gestures-large and also the other models. Can we download the reference model and obtain this information by using netron? What if the outcome of the experiments gives a model architecture that differs a lot from the reference model?

Do we obtain good estimation for models that solves a regression problem?

Thanks for the feedback.

Regards,
Joeri