I’m in the process of executing the Responding to your voice tutorial for an ESP32 in combination with the TensorFlow lite micro-speech example.
The to the ESP-NN library converted example to be precise. It can be found on GitHub.
I know I can’t support related to the ESP-NN part (I’ve already had great success with image-classification), but I hopefully can regarding the model settings.
I can’t seem to figure out the slice size and count. If I’m correct, the stride and duration are equal to 20 milliseconds, but I’m still not too sure on that part too. I checked that the application expects the slice size multiplied by the slice count to be equal to 650. Could that be right? Or is that too short for recognizing ‘look pete’?
That’s great! About time too. I’m sure a lot of people (including myself) will be very happy about this.
Unfortunately for my current project, we already have a functioning implementation for concurrently running image classification on two cores without the entire edge impulse sdk behind it. We’re also pretty close to finishing a same kind of implementation for speech recognition. Apart from actually training a model ourselves with edge impulse.
After providing official support for the ESP32, surely you can help me with determining the values for said variables? Help is greatly appreciated.
Hi, @JVKran !
We might need a bit more context here - if you use Edge Impulse for model training, then you can use C++ library option to see model parameters. If you’re not using Edge Impulse for training the model, then I’m not sure which part of your work is related to EI