SMOTE in Edge Impulse?

Ruslan · March 3, 2022, 3:51pm

Hi, I am working with a very small audio dataset of 6 minutes, trying to recognise a specific type of bird. I also have two other classes; _unknown for all kinds of birds and _noise for background noise, for both of these classes I have way more data than for the former class.

I was wondering, to not make an imbalanced dataset, is there a way I can do oversampling with SMOTE (Synthetic Minority Oversampling Technique - https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/) and expand my specific bird dataset from 6 minutes to 20 minutes? I would imagine it would belong within feature extraction, but I cannot find it. Unfortunately, there is no way for me to make more samples of the bird.

Cheers

shawn_edgeimpulse · March 7, 2022, 5:28pm

Hi @Ruslan,

You can select “Data Augmentation” in the training block for audio projects. However, I do not believe it uses SMOTE. You can create a custom augmentation script that uses SMOTE and automatically splits your datasets before uploading them to your Edge Impulse project. It’s similar to the augmentation script I put together here (https://github.com/ShawnHymel/ei-keyword-spotting) that mixes in background noise to the various audio samples.