So far I am impressed by the Studio. User friendly, allows access to the python code for optimization, command line tools for the uploading of training and test data.
These suggestions reflect a small subset of features I usually have in my ML pipeline…
During Training Data Augmentation
This could be adding random noise, random time shifts etc… We are dealing with impulse signals which can be as shorts as 10->30ms.
Audio mixing, mixing different background sounds with the training data (That would require a background noise data set).
Etc…
Data Preprocessing
Filters, low, high, band pass filters
Etc…
Audio Data
Sampling frequency of up to 48kHz. Frequencies of 8->16kHz provide quite important feature information for applications we are currently looking at.
The below suggestion is probably too much.
I guess you are familiar with Audacity and wavesurfer based tools. They are very useful for splitting up audio data for labelling. You have a great tool for data capture, display and playback. Combine that with a limited editing capability, such as audio file splitting and cutting. I think some companies might abandon developing their own in house tools, if they saw that capability.
Hi @PaulCreaser, thanks a lot for that, super helpful! We’ll be adding data augmentation in the very near future, including a noise dataset. Additionally @dansitu was looking at using the data augmentation settings as a hyperparameter when training networks so we can vary with e.g. random time shifts automatically and create the best performing audio model.
Sampling frequency of up to 48kHz. Frequencies of 8->16kHz provide quite important feature information for applications we are currently looking at.
Yep, there’s actually nothing holding you back in uploading 48KHz, but we downsample to 16KHz in the uploader (if you patch out https://github.com/edgeimpulse/edge-impulse-cli/blob/master/cli/uploader.ts#L362 then this no longer happens). This is there because WAV files can encapsulate multiple encodings and we had to convert to PCM. We should not do this for higher bitrates in the uploader. I’ll file a bug for this.
Data Preprocessing
Filters, low, high, band pass filters
Etc…
In a previous iteration we added the low frequency and high frequency parameters for the MFCC block (default to 300…f/2 Hz):
Does this not suffice? It’s not exactly a bandpass filter as these are parameters to the MFE algorithm, but I believe it does something similar.
I guess you are familiar with Audacity and wavesurfer based tools. They are very useful for splitting up audio data for labelling. You have a great tool for data capture, display and playback. Combine that with a limited editing capability, such as audio file splitting and cutting. I think some companies might abandon developing their own in house tools, if they saw that capability.
We’ve been thinking about this as well, mostly to cut parts of the audio off (it’s sometimes hard to get the timing exactly right), and we have a feature request in our bug tracker dating back to February; but hard to pinpoint when we’d have the engineering bandwidth for this. Great suggestion though, I’ve wanted this for other signals too.
In terms of data augmentation, we’ll soon be adding some mechanisms specifically for audio, including an implementation of SpecAugment and mixing in a noise dataset. I’d love to hear if you have any other favorite techniques!
Thanks for the reference to the paper. Time masking and frequency masking is something we do for audio. The time warping is something new to me.
On the filter, I was actually using it… but forgot.
As for audio with a 48kHz sampling frequency, and uploading, I tried that yesterday and it worked fine.
With regards to preprocessing. Techniques such as noise canceling, noise reduction techniques such as
PCAN ( Per Channel Amplitude Normalization)
SPECTRAL GATING
SPECTRAL SUBTRACTION
Etc…
Might be beneficial. The Tensorflow Lite Micro speech sample code is a good example. Edge Impulse are heavily involved in TinyM, so you will know that better than me.
I see this can be achieved using custom processing blocks. I haven’t tried this yet, but it looks like it is worth exploring. I suppose in the future, there is the potential for a repository of custom blocks. Those which reach a certain level of maturity could then become selectable from the Web Interface.
Yeah, our idea right now is that you can develop these and if they reach maturity open a PR against the https://github.com/edgeimpulse/processing-blocks repository. If they land there they’ll be available for all users!
I just added more audio data to my project, but apparently, Edge Impulse does not want to split the new samples between training and testing. Is there a way to force Edge Impulse Studio to do this? I uploaded some WAV files of audio that I recorded and then split those into individual samples. My split went to 86%/14% because I uploaded the files to training only. Is there a way to do a split on just newly uploaded data??