Using a Mobilnet v2 in an existing ESP S3 project

Eudald · August 21, 2023, 7:42am

Question/Issue: I have an already existing project which uses CNNs in an ESP S3 for keyword spotting. Up to this point, I’ve been using the tensorflow libraries to compute the mel spectrogram. Since all the tensorflow libraries are already integrated within the project, I just swapped the neural networks hoping it would work.

While the mobilnets work, their accuracy is much lower than the training value. Some of the keywords get mixed up, and the results are much worse than I expected. From testing, I found that the mel spectrogram I get from tensorflow within the esp32 is not exactly like the one I get from Edge Impulse libraries:

Seeing this I tried to train the moblinet locally with the spectrograms coming from tensorflow lite, but accuracy went from 99% to 74%. Is this because the mobilnet is already pre-trained with features coming from tensorflow? Why is there such a difference from tensorflow to edge impulse? Are there standalone dsp libraries that I could integrate with my project?

Project ID: 265677

Context/Use case: Keyword spotting algorithm on the edge.

matkelcey · August 21, 2023, 9:48pm

Yes, you are correct, we do use a pretrained mobilenet trunk for the keyword spotting, and it was pretrained with the default spectrogram on a large dataset. So changing either the architecture, or the type of the spectrogram, will result in a pretty big drop for performance.

Cheers, Mat

Eudald · August 22, 2023, 7:02am

Hello Mat,

Thank you for your reply, since I suspected this was the issue I’ve begun to look into porting some libraries from Edge Impulse to my project. To do so, I’ve cloned the esp32 standalone example and begun testing it. GitHub - edgeimpulse/example-standalone-inferencing-espressif-esp32: Builds and runs an exported impulse locally (ESP IDF)

My problem with it is that my model does not seem to recognize any samples, even ones used during training. When I copy the raw data from my project, no matter which I choose, the result is always 99.6% unknown. Since I trained using 16 bit integers I changed the type of the feature array, but I got the same results before the change.

Thank you for your help.