AI Audio Labeling Bulk 5m Recording

edgeBlahBliss · June 12, 2025, 12:34am

Question/Issue:
[Describe the question or issue in detail]
Hello,

I’m wondering if it is possible to use the AI labeling block “Edge Impulse Inc / Audio labeling with AudioSet” to automatically take in a 5 minute recording that contains multiple instances of a 2s audio recording apply labels either to each instance of audio, or split up the recording into multiple files, 1 for each 2s recording.

I’m currently doing it manually and am trying to figure out how to leverage this tool to speed that up. The labels I have are custom and it seems to be reporting an error if I don’t use one of the predefined ones on the list.
My model is already partially trained on these keywords and recognizing them fairly accurately so I’m wondering if I can leverage that to build out the rest of the training and sample data faster than doing it all manually?

Currently I take in a 30s recording filled with as many instances of the audio as I can get in that window and then I go in and use the “split” feature to turn it into multiple files by manually setting bounding boxes over the audio after I listen to it verify the beginning and end of the key words.
Project ID:
[Provide the project ID]
663220
Context/Use case:
[Provide context or use case where the issue is encountered]

Steps Taken:

In Data acquisition go to AI labeling
Select Audio labeling with AudioSet
Connect HuggingFace API Key
Enter customer labels of interest

Expected Outcome:
[Describe what you expected to happen]
I was hoping if the model is trained up enough it can recognize the custom keywords and I can help it automate the rest of the process to save me time.
Actual Outcome:
[Describe what actually happened]
Job fails
Reproducibility:

[ *] Always
[ ] Sometimes
[ ] Rarely

Environment:

Platform: [Arduino Nano 33 BLE Sense]
Build Environment Details: [e.g., Arduino IDE 1.8.19 ESP32 Core for Arduino 2.0.4]
OS Version: [e.g., Ubuntu 20.04, Windows 10]
Edge Impulse Version (Firmware): [Website but v1.32.0]
To find out Edge Impulse Version:
if you have pre-compiled firmware: run edge-impulse-run-impulse --raw and type AT+INFO. Look for Edge Impulse version in the output.
if you have a library deployment: inside the unarchived deployment, open model-parameters/model_metadata.h and look for EI_STUDIO_VERSION_MAJOR, EI_STUDIO_VERSION_MINOR, EI_STUDIO_VERSION_PATCH
Edge Impulse CLI Version: [e.g., 1.5.0]
Project Version: [e.g., 1.0.0]
Custom Blocks / Impulse Configuration: [Time series data, Audio (MFCC), Classification (basically the keyword spotting example)]
Logs/Attachments:
[Include any logs or screenshots that may help in diagnosing the issue]

Additional Information:
[Any other information that might be relevant]

Eoin · June 16, 2025, 1:06pm

Hi @edgeBlahBliss

Splitting the audio more can be performed by the “Splitting data sample” option - Data acquisition | Edge Impulse Documentation

Glad you are enjoying this custom block, I have not used this yet but checking with one of the coauthors (@ivan ) now to see if they can help or advise how to change the labels.

Their labels look like this, is this the expected format? I checked the logic on the repo and it looks like newline separated is the delimiter, so this should be correct:

COM Priority
Disabled
Enabled

Best

Eoin

ivan · June 16, 2025, 3:35pm

Hi @edgeBlahBliss - glad you’re interested in using the block

By design it uses the AST model on hugging face - so you are constrained to using / chosing the categories that this model already knows (you can then call them differently but the class of your interest should be similar to one that that model knows)

Here’s the full list of valid labels: [AudioSet dataset]
(AudioSet)

If youre curious here is more information and the actual implementation of this block:

But as you are saying if you already have a model taht works good for your data, you can use a technique where you can label your data with one of your impulses - more info here:

Hope this is helpful!