Bulk Upload Data (Images) Python Snippet

davidtischler_edgeim · January 19, 2023, 8:35pm

Anyone have a small snippet of Python that would allow me to upload images in bulk via the Ingestion Service?

ChatGPT says this should work, but, I am not convinced:

Can anyone give me a correct snippet, please and thanks in advance!

import os
import requests

# Set the directory to parse
directory = 'my/data/directory'

# Iterate through the subdirectories in the given directory
for subdir in os.listdir(directory):
    subdir_path = os.path.join(directory, subdir)
    if os.path.isdir(subdir_path):
        label = subdir

        # Iterate through the files in the subdirectory
        for file in os.listdir(subdir_path):
            file_path = os.path.join(subdir_path, file)
            if os.path.isfile(file_path):
            	with open(file, 'r') as file:
            	 res = requests.post(url='https://ingestion.edgeimpulse.com/api/training/data',
            	 data=file,
            	 headers={
            	 'Content-Type': 'image/jpeg',
            	 'x-file-name': file,
            	 'x-label': label,
            	 'x-api-key': 'ei_xxxxxxxxx'
            	})

            if (res.status_code == 200):
             print('Uploaded file to Edge Impulse', res.status_code, res.content)
            else:
             print('Failed to upload file to Edge Impulse', res.status_code, res.content)

janjongboom · January 20, 2023, 4:16pm

@davidtischler_edgeim

See New ingestion API - Edge Impulse API ← does it in bulk already

MMarcial · January 21, 2023, 3:22am

No doubt Jan is correct but one must manually create the files array (based on the examples at the link given):

files = [
    ('data', open('one.png', 'rb')),
    ('data', open('two.png', 'rb')),
]

The original post code in question goes and gets all files pointed to without any manual array manipulation. A use case would be files automatically added to a post production release training set for example when an inference gave a very low prediction rate.

MMarcial · March 15, 2023, 9:58pm

The code as-was does not work.

The code uses the legacy data endpoint so I did not try and verify that endpoint.
The code is trying to stick too much in the header{}.
Most of the code is correct.
Using the files endpoint the code need to be modified as follows:

with open(file_path, 'r') as file: ← Change “file” to "file_path"
res = requests.post(url='https://ingestion.edgeimpulse.com/api/training/files' ← Change the endpoint “data” to "files"
REM OUT → data=file_path, <–“data” is not used. You must use “files”.
headers={
REM OUT → 'Content-Type': 'image/jpeg', This gets included in files{}
REM OUT → 'x-file-name': file_path, This gets included in files{}
'x-label': label,
'x-api-key': 'ei_xxxxxxx},
files = { 'data': (os.path.basename(file_path), open(file_path, 'rb'), 'image/jpeg') }) ← Add this line

For those that want to cut-n-paste:

The folder structure is:
images-|
--ClassName01-|
    File01
    File02
    Filenn
--ClassName02-|
    File01
    File02
    Filenn

import os
import requests

# Set the directory to parse
directory = 'my/images/'

# Iterate through the sub-directories in the given directory
for subdir in os.listdir(directory):
    subdir_path = os.path.join(directory, subdir)
    if os.path.isdir(subdir_path):
        label = subdir

        # Iterate through the files in the subdirectory
        for file in os.listdir(subdir_path):
            file_path = os.path.join(subdir_path, file)
            if os.path.isfile(file_path):
              with open(file_path, 'r') as file:
            	  res = requests.post(url='https://ingestion.edgeimpulse.com/api/training/files',
            	  headers={
            	 'x-label': label,
            	 'x-api-key': 'ei_xxxxxxx},
                files = { 'data': (os.path.basename(file_path), open(file_path, 'rb'), 'image/jpeg') }
              )

            if (res.status_code == 200):
             print('Uploaded file to Edge Impulse', res.status_code, res.content)
            else:
             print('Failed to upload file to Edge Impulse', res.status_code, res.content)

davidtischler_edgeim · March 17, 2023, 3:27am

I knew ChatGPT could not be trusted, ha! Thanks @MMarcial!