Bulk Upload Data (Images) Python Snippet

Anyone have a small snippet of Python that would allow me to upload images in bulk via the Ingestion Service?

ChatGPT says this should work, but, I am not convinced:

Can anyone give me a correct snippet, please and thanks in advance!

import os
import requests

# Set the directory to parse
directory = 'my/data/directory'

# Iterate through the subdirectories in the given directory
for subdir in os.listdir(directory):
    subdir_path = os.path.join(directory, subdir)
    if os.path.isdir(subdir_path):
        label = subdir

        # Iterate through the files in the subdirectory
        for file in os.listdir(subdir_path):
            file_path = os.path.join(subdir_path, file)
            if os.path.isfile(file_path):
            	with open(file, 'r') as file:
            	 res = requests.post(url='https://ingestion.edgeimpulse.com/api/training/data',
            	 data=file,
            	 headers={
            	 'Content-Type': 'image/jpeg',
            	 'x-file-name': file,
            	 'x-label': label,
            	 'x-api-key': 'ei_xxxxxxxxx'
            	})

            if (res.status_code == 200):
             print('Uploaded file to Edge Impulse', res.status_code, res.content)
            else:
             print('Failed to upload file to Edge Impulse', res.status_code, res.content)

@davidtischler_edgeim

See New ingestion API - Edge Impulse API ← does it in bulk already

1 Like

No doubt Jan is correct but one must manually create the files array (based on the examples at the link given):

files = [
    ('data', open('one.png', 'rb')),
    ('data', open('two.png', 'rb')),
]

The original post code in question goes and gets all files pointed to without any manual array manipulation. A use case would be files automatically added to a post production release training set for example when an inference gave a very low prediction rate.

1 Like

The code as-was does not work.

  1. The code uses the legacy data endpoint so I did not try and verify that endpoint.
  2. The code is trying to stick too much in the header{}.
  3. Most of the code is correct.
  4. Using the files endpoint the code need to be modified as follows:

with open(file_path, 'r') as file: ← Change β€œfile” to "file_path"
res = requests.post(url='https://ingestion.edgeimpulse.com/api/training/files' ← Change the endpoint β€œdata” to "files"
REM OUT β†’ data=file_path, <β€“β€œdata” is not used. You must use β€œfiles”.
headers={
REM OUT β†’ 'Content-Type': 'image/jpeg', This gets included in files{}
REM OUT β†’ 'x-file-name': file_path, This gets included in files{}
'x-label': label,
'x-api-key': 'ei_xxxxxxx},
files = { 'data': (os.path.basename(file_path), open(file_path, 'rb'), 'image/jpeg') }) ← Add this line


For those that want to cut-n-paste:
The folder structure is:
images-|
--ClassName01-|
    File01
    File02
    Filenn
--ClassName02-|
    File01
    File02
    Filenn

import os
import requests

# Set the directory to parse
directory = 'my/images/'

# Iterate through the sub-directories in the given directory
for subdir in os.listdir(directory):
    subdir_path = os.path.join(directory, subdir)
    if os.path.isdir(subdir_path):
        label = subdir

        # Iterate through the files in the subdirectory
        for file in os.listdir(subdir_path):
            file_path = os.path.join(subdir_path, file)
            if os.path.isfile(file_path):
              with open(file_path, 'r') as file:
            	  res = requests.post(url='https://ingestion.edgeimpulse.com/api/training/files',
            	  headers={
            	 'x-label': label,
            	 'x-api-key': 'ei_xxxxxxx},
                files = { 'data': (os.path.basename(file_path), open(file_path, 'rb'), 'image/jpeg') }
              )

            if (res.status_code == 200):
             print('Uploaded file to Edge Impulse', res.status_code, res.content)
            else:
             print('Failed to upload file to Edge Impulse', res.status_code, res.content)
1 Like

I knew ChatGPT could not be trusted, ha! Thanks @MMarcial!