Auto Labeler Failing

cxtr · January 14, 2024, 3:40pm

Auto Labeler keeps failing.

Project ID: 337134

Every time I run the auto labeler, I get this error:

Creating job... OK (ID: 15240975)

Scheduling job in cluster...
Container image pulled!
Job started
[1/4] Downloading files...
Syncing 16146 files...
[    1/16146] Syncing files...
[16146/16146] Syncing files... (0 MB downloaded)
Done. Success: 16146 Failed: 0
[1/4] Downloading files OK

[2/4] Creating segments...
Creating segments from 16146 images (cuda)...
[  489/16146] Creating segments...
Traceback (most recent call last):
  File "segment-dataset.py", line 115, in <module>
    masks = mask_generator.generate(source_image)
  File "/app/.venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/app/.venv/lib/python3.8/site-packages/segment_anything/automatic_mask_generator.py", line 163, in generate
    mask_data = self._generate_masks(image)
  File "/app/.venv/lib/python3.8/site-packages/segment_anything/automatic_mask_generator.py", line 206, in _generate_masks
    crop_data = self._process_crop(image, crop_box, layer_idx, orig_size)
  File "/app/.venv/lib/python3.8/site-packages/segment_anything/automatic_mask_generator.py", line 245, in _process_crop
    batch_data = self._process_batch(points, cropped_im_size, crop_box, orig_size)
  File "/app/.venv/lib/python3.8/site-packages/segment_anything/automatic_mask_generator.py", line 297, in _process_batch
    data.filter(keep_mask)
  File "/app/.venv/lib/python3.8/site-packages/segment_anything/utils/amg.py", line 49, in filter
    self._stats[k] = v[torch.as_tensor(keep, device=v.device)]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.68 GiB (GPU 0; 14.76 GiB total capacity; 9.01 GiB already allocated; 4.82 GiB free; 9.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Application exited with code 1

Does anyone know if it’s possible to fix this?

Eoin · January 16, 2024, 12:19pm

Hi @cxtr

Reporting this to our tech team.

You are using GPU builds as part of your enterprise trial, are you engaging with any of our solutions team? Let me know who you are in contact with so we can arrange a follow up.

Best

Eoin