Audio sample use Split feature or not. Student's results were good without split?

I finished my week long 3 hours a day summer course teaching edge impulse to grades 8-10 students over zoom. One student made an interesting observation. They knew how to enter multiple audio samples and then use the split feature to spit the samples, (many students just entered individual 1 second samples as it was an easier process for them.) This student just left multiple samples in a 4 second burst and his results were MUCH better than the rest of the class.

We used the default continuous live classification with our cell phones. I assume the window size value was 4 seconds but when I looked at his impulse I noticed his window size was the default 1 second and window increase was 0.5 second. His results should not have been very good. I would expect Edge Impulse to only analyze the first second of each data sample, but the results were very good. Any idea why?

1 Like

Hi @Rocksetta when setting window size to 1 second we look at the full sample, but we slice it up in 1 second windows (with window increase as the step), so out of the one 4 second window we create:

0…1000 ms
500…1500 ms
1000…2000 ms

Now why would this have an effect on accuracy? More data most likely. For every 4s sample you now create 9 samples, rather than (at most) 4. So you double your training set which makes a lot of difference on a short dataset with 6 minutes 50 seconds of data as above. However… you have zero control over how the samples are created, so if the spacing between keywords is larger than one second you create noise samples that are labeled as the keyword which will have a bad effect on the accuracy.


Thanks @janjongboom the bigger easier to
enter dataset explains the better results, probably a few half samples also has a positive effect, but your right we should watch for empty samples.

Hi @janjongboom if I collect 1s of sample data, what should be recommended window size and how many samples should be good enough for accurate result, lets say for 3 labels.

Keep window size at 1 second in that case. Dataset size depends a bit, but for keyword spotting ~8 minutes per class should give you a good idea.