A few days ago, Pete Warden, whose work inspired me to get into tinyML, released a blog titled “One weird trick to shrink convolutional networks for tinyML." In it, he talks about how we can replace a combination of convolutional and pooling layers with a single convolutional layer with a stride of 2. The advantage of this is twofold: firstly, you get the same output size in both cases, but do not need to store the output of the convolutional layer which saves a lot of memory (1/4th less memory), and secondly you perform fewer computes so you get an increase in inference time as well. However, Pete also points out that this method might result in a drop in accuracy, but with the decrease in resource usage, you can regain that accuracy by changing some other hyperparameters of your model.
This is a companion discussion topic for the original entry at https://www.edgeimpulse.com/blog/some-more-weird-tricks-to-shrink-convolutional-networks-for-tinyml