Using a subset of collected data

athommandram · January 20, 2022, 2:35am

I’m working on an audio classification project and used the data uploader on the web to upload my entire dataset. This ended up being a huge amount of data which was causing issues during the MFCC feature generation, the error message saying I should increase the window increase to reduce the number of windows. I played around with that setting until the error went away but I was wondering if there was a way to only use a subset of the uploaded data. I am not sure yet how much data is actually required and instead of uploading a bit, testing, then uploading more, etc it would be convenient if I could upload everything but choose only to use pieces.

I noticed there is a feature to enable/disable a sample. And I can use the filters and select multiple items features in the Data acquisition page to select several samples and disable them. But is there a way to select n samples randomly to disable them?

athommandram · January 20, 2022, 2:42am

Well this is embarrassing, but I think I may have answered my own question in a few minutes just by searching the API reference.

I discovered the batch disable endpoint:
https://docs.edgeimpulse.com/reference/batchdisable

It takes a list of ids and disables them. So if i combine that with the list samples endpoint:
https://docs.edgeimpulse.com/reference/listsamples

I can get my full list of sample ids, do whatever random selection I want from them and batch disable them. I think this is pretty much what I imagined (although now I’m not sure this is really needed since I guess increasing the ‘window increase’ is essentially using a subset of the data in an evenly distributed manner anyway)