I am trying to run a 29 label speech model (28 Words and 1 Noise Profile) on an ESP32 Dev Board, I deployed the arduino library for ESP-EYE and uploaded the code, The ESP just keeps resetting over and over with this serial output:
rst: 0xc (SW CPU_RESET), boot: 0x13 (SPI FAST FLASH_BOOT)
configsip: 0, SPIWP: 0xee
clk drv: 0x00, q drv: 0x00, d drv: 0x00, cs0_drv: 0x00, hd drv: 0x00, wp_drv: 0x00
mode: DIO, clock div:1
The initial model I tested with just 4 labels (without noise label) was working perfectly so I know it’s not a hardware issue. the only thing connected to the ESP is an INMP441 MEMS I2S Microphone and it is working fine. the ESP is powered by a 5V 2A Power Supply so power is also not an issue. Can anyone please help me on how I can make this work? Thanks in advance for your precious time.
I flashed the ESP with a fresh firmware using ESPTOOL and now the code is able to run on it without the SW CPU_RESET error. However, it is still not recognizing any of the words. the serial output is displayed properly but the probabilities of the trained words are randomly being displayed. it catches some words and displays >0.5 probability that too when a word is said 5-6 times. please help me fix this
There is a lot of data here to troubleshoot. From what I can see the Data Explorer is showing that it cannot figure out how to seperate the data. For example, the fan data is spread throughout the dataset.
When looking at individual Samples some of the data has the keyword and then a noise at the start or end. I would crop these Samples so that just the keyword is in the sample.
I suggest training on 3 labels. Then use the Test page and delete outliers. Then successively add in another label, train, test, and delete outliers. Repeat until all labels are in the Model. As you are going thru this put on your Data Scientist cap and come to a conclusion as to what is creating these outliers. This will help future data gathering missions.
Regarding keywords:
Try to use at least 3 syllable keywords.
Use keyphrases or multiple keywords that have one label.
Example: User says open door not just open.
I am currently looking at optimizing the MFCC to see if the robust dataset you collected can be used to train a repeatable and accurate Model and will report back.