Source code for automatic audio split sample

StyrbjornKall · October 4, 2023, 2:09pm

Question/Issue:
There’s a feature in data acquisition that allows a user to split longer audio samples into chunks of specific length (e.g. 5 seconds) containing the “meaningful” information, such as a keyword. It is used in the Responding to your voice - Edge Impulse Documentation (gitbook.io) tutorial.

This is a great feature, however it is manual and with >100 audio samples it becomes unusable. Is the source code for this feature available anywhere so that one may pre-process the data in bulk prior to upload? Would also be interesting to know how it works since I can see that it sometimes does not match what I want.

Thank you!

louis · October 5, 2023, 8:40am

Hello @StyrbjornKall,

If you want to automate this process, you can have also look at our transformation blocks, it is a typical use case for the transformation blocks.
I am not sure the source code is currently publicly available, I’ll ask around and will let you know.

Best,

Louis

aurel · October 5, 2023, 9:20am

Hi @StyrbjornKall,

You can use the 2 following API endpoints to retrieve segments and then split the sample:

Below is some (old) sample python script that should do the work:

API_KEY = "ei_d2a..."
PROJECT_ID = 12345

SEGMENT_LEN = 1000 # 1 sec samples
SHIFT_SEGMENTS = False

headers = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "x-api-key": API_KEY
}

# List all 'helloworld' samples from testing set
url = "https://studio.edgeimpulse.com/v1/api/" + str(PROJECT_ID) + "/raw-data"
querystring = {"category":"testing", "labels":"[\"helloworld\"]"}

response = requests.request("GET", url, headers=headers, params=querystring).json()

print("Number of samples to segment: " + str(len(response["samples"])))

# Loop over samples
for sample in response["samples"]:

    # find 1 sec segments
    url = "https://studio.edgeimpulse.com/v1/api/" + str(PROJECT_ID) + "/raw-data/" + str(sample["id"]) + "/find-segments"
    payload = {
    "shiftSegments": SHIFT_SEGMENTS,
    "segmentLengthMs": SEGMENT_LEN
    }

    response = requests.request("POST", url, json=payload, headers=headers).json()

    if response["success"]:
        print("Found segments for sample " + str(sample["id"]))

        # segment samples
        url = "https://studio.edgeimpulse.com/v1/api/" + str(PROJECT_ID) + "/raw-data/" + str(sample["id"]) + "/segment"
        payload = {"segments": response["segments"]}

        response = requests.request("POST", url, json=payload, headers=headers).json()

        if response["success"]:
            print("Sample " + str(sample["id"]) + " segmented")
        else:
            print("ERROR: sample " + str(sample["id"]) + " cannot be segmented")
            print(response)
    else:
        print("ERROR: Cannot find segments for sample " + str(sample["id"]))

Aurelien

StyrbjornKall · October 5, 2023, 12:13pm

Thanks to you both for the quick replies!

@aurel I will try the api solution!

@louis out of curiosity the actual code that makes this happen would be interesting to see, been trying a bunch of librosa methods but don’t quite get the same behavior nor ease.