AI Label Feature

Feature Description: I am requesting a set of enhancements for the AI Label feature to improve stability and workflow efficiency when processing large datasets. Specifically, I need improvements in four areas: (1) Immediate UI reflection of metadata changes, (2) Error handling for malformed LLM responses, (3) Configurable concurrency for performance, and (4) Granular batch selection to manage job execution better.

Context/Use case: I am currently using the AI Label feature to process large sets of image data. My workflow involves running batch jobs where an LLM analyzes images and assigns labels/metadata.

Problem to be Solved: I am facing four distinct issues that hinder my progress:

  1. UI Sync Lag: When I run a labeling job on a large dataset, I can see the metadata is successfully set in the background, but the visual label on the image does not update immediately. And if the job fail, the metadata got update, but label for those no

  2. Job Fragility (JSON Parsing): Sometimes the LLM returns a bad format (e.g., unparsable JSON). Currently, this causes the specific AI Label job to break and terminate entirely. This is frustrating because one bad response shouldn’t stop the whole process.

  3. Slow Sequential Processing: The jobs appear to run sequentially.

  4. Limited Data Selection: Currently, I can only choose broad categories like “All data” or “Not labeled.” If a job with 1,000 items breaks at item 500, I have no way to easily resume from item 501. I am forced to re-evaluate the logic for the whole set or rely on the “Not labeled” filter, which isn’t always accurate for my needs.

Proposed Solution: To solve these problems, I propose the following technical implementations:

  1. Real-time UI Updates: : Label data and metadata set at the same time
  2. “Ignore Errors” Option: A checkbox configuration in the job setup (e.g., “Ignore Parsing Errors”). If checked, the system should log the JSON error, or maybe 502/503 error, skip that specific item, and continue to the next one without crashing the job.
  3. Configurable Concurrency: An option to configure the number of parallel LLM calls (e.g., a “Concurrency Limit” input). This will allow me to run parallel calls on items to significantly improve performance.
  4. Batch/Range Selection: A better way to choose segments for labeling. Instead of just “All” or “Unlabeled,” please allow me to define a range or batch limit (e.g., “Run on items 1–200” or “Process next 500 items”).

Expected Benefits:

  • Workflow Stability: Able to run long jobs overnight without worrying that a single bad LLM response will crash the entire process.
  • Performance: Able to label datasets much faster by utilizing parallel processing.
  • Control: Have the ability to test the prompt on small batches (e.g., first 10 items) before committing to a full run, saving me time and API costs.

Potential Impact: This will drastically improve efficiency in dataset preparation. It transforms the AI Labeling tool from a fragile feature into a robust production tool that I can rely on for high-volume data.

Related Issues or Requests: [List any related issues or feature requests]

Additional Information:

  • I have attached a screenshot in the description showing the issue where metadata is set, but the label is not yet updated visually.

Hi @cyouptit

Thanks for the detailed feedback, a lot to work through here let me make sure we capture and relay the feedback and feature requests to the right teams. This could also be something to investigate via a custom AI labelling block if you want to try to.

  1. UI Feature Request - Real-time UI Updates: : Label data and metadata set at the same time
  2. UI Feature Request - “Ignore Errors” Option: A checkbox configuration in the job setup (e.g., “Ignore Parsing Errors”). If checked, the system should log the JSON error, or maybe 502/503 error, skip that specific item, and continue to the next one without crashing the job.
  3. Functianal Feature Request - Configurable Concurrency: An option to configure the number of parallel LLM calls (e.g., a “Concurrency Limit” input). This will allow me to run parallel calls on items to significantly improve performance.
  4. Functianal Feature Request - Batch/Range Selection: A better way to choose segments for labeling. Instead of just “All” or “Unlabeled,” please allow me to define a range or batch limit (e.g., “Run on items 1–200” or “Process next 500 items”).

Best

Eoin

Hi @Eoin,

Thanks for the reply!

Just to clarify, the real-time UI update issue is no longer relevant. That one was completely on me. I accidentally set my project type to classification while I was actually labeling with object detection. Because of that, the labels didn’t appear in the filter list, so I thought nothing was being saved. Turns out the labels were there the whole time.

The other feature requests you mentioned are spot-on and exactly what I had in mind during the Edge Impulse hackathon recently. In the end, I ended up building a small tool around the Edge Impulse API to solve these workflow problems for my own case. Hopefully it might help with the development process of these new feature too.

Here’s the GitHub repo of it: GitHub - quochung-cyou/label-tool-edgeimpulse: A production-ready, scalable Python pipeline for automatically labeling images in Edge Impulse projects using Google's Gemini API with Kafka and Redis.

Best,
Hung