Hello there,
I have been working on deploying a TinyML model for a project that uses a microcontroller with limited resources. The model I have designed performs quite well in the Edge Impulse platform; but when I try to deploy it on the device; I am running into performance issues both in terms of latency and memory consumption.
The model is designed for real time sensor data processing e.g., accelerometer data; audio classification; etc.
I have used the quantization feature to reduce the model size; but I am still facing high latency during inference.
The device often hits memory limits; especially when processing longer streams of data.
What additional optimizations can I apply in Edge Impulse to further reduce the models size and inference time; especially for microcontrollers with constrained resources?
Has anyone here experimented with techniques like model pruning or knowledge distillation in conjunction with Edge Impulse? If so; did you notice significant improvements, and are there any potential pitfalls I should be aware of?
Also, I have gone through this post; https://www.edgeimpulse.com/blog/how-to-optimize-ml-model-accuracy-in-resource-constrained-embedded-applications-blue-prism/ which definitely helped me out a lot.
Are there specific libraries or deployment strategies for edge devices that you would recommend to handle memory issues more effectively?
Thanks in advance for your help and assistance.