Memory - Conv1D vs Dense Layer

Hello EI Team,

I have a question about the model memory usage.

Currently I am performing some experiments (hyper-parameter tuning + metrics/artifact tracking using wandb).

I have two models:
model 1: with two Conv1D layers and one Dense Layer
model 2: with three Conv1D and one Dense Layer

During each run I calculate some metrics, MSE, … , incl. the size of the quantized model (post-training quantitation). What I notice, so far (the experiments are still running), is that in some runs the model size for model 2 is smaller than model 1, both models have similar MSE.

I like to get a better understand way. Are there some resource to give me some inside in the use of memory usage between 1DConv en Dense Layers?

Regards,
Joeri

Hello @Joeri

I am not sure I am the best person to fully understand your question :smiley:

Are you varying the hyperparamters of each layers? If so, it will vary in size.

Each deep learning training need to store information:

  • information necessary to backpropagate the error
  • information necessary to compute the gradient of the model parameters

And both conv1D and dense layers have different calculation processes, thus memory impacts.

@louis

Thanks for the response. Correct changing the parameters has an impact on the memory.

However, what I am currently looking for is a better (theoretical) understanding of the memory impact for each type of layer (1DConv, 2DConv, … and Dense layer) inside a network.

What surprises me is that a (“larger”) 3-layer 1DConv + Dense has a lower need for memory compared to a 2-layer 1DConv + Dense (given the same MSE, i.e. regression model). In case you tune the parameters only of the Dense layer, it has a large impact on memory. Probably because the Dense Layer is a fully connected layer and has more weights to store. Of course, a comparison between the two models is still difficult because you can tune a lot of parameters…

Regards,
Joeri

We generally expect that Convolutions have far fewer model parameters than a Dense layer since their inductive bias is around spatial invariance; i.e. that are not fully connected. So in that sense they have a smaller memory footprint. But that’s not the same as working memory during the actual forward pass where, depending on the sizing of the convolution, and the input size, a conv network might use much more memory.

So, it can really depend… do you have the specific architecture of the two models?

e.g. from expert mode, print(model.summary()) or via reviewing the tflite model in https://netron.app/

From the architecture, and some specific input sizing, we can describes things in more detail

Cheers, Mat

1 Like

@matkelcey thanks for the reply. I track the model incl. the model structure. However, I need to double-check before sharing the final model. I will come back to you as soon as I have the two final models. Maybe we can have a more in-depth discussion? (@janjongboom knows the details of the project where I am working on.)

Regards,
Joeri