Application exited with code 137 (OOMKilled) error

Hi,

I am trying to train my model after uploading about 1500 images and 10 classes.
Project ID: 85642

I’ve tried running the transfer learning phase several times, and I am getting the following error each time:

/app/run-python-with-venv.sh: line 21:    10 Killed                  PATH=$PATH:/app/$VENV_NAME/.venv/bin /app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)

Job failed (see above)

I Would appreciate your help here :slightly_smiling_face:
Thanks!

Training Output:

Creating job... OK (ID: 2280450)

Scheduling job in cluster...
Job started
Splitting data into training and validation sets...
Splitting data into training and validation sets OK

Training model...
Training on 956 inputs, validating on 240 inputs
Building model and restoring weights for fine-tuning...
Finished restoring weights
Fine tuning...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
Attached to job 2280450...
/app/run-python-with-venv.sh: line 21:    10 Killed                  PATH=$PATH:/app/$VENV_NAME/.venv/bin /app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)

Job failed (see above)

Hi @Patricksch,

This is a memory issue due to the size of your dataset. I’ve increased your limit so hopefully this will help.

Aurelien

Thank you! @aurel

No it didn’t work unfortunately, but I’ll try to delete some images.

1 Like

Hi @aurel

I have now minimised the images to 974 in the training data and minimised them to 253 items in the test data and it still doesn’t work. What could be the reason for this? Could you please help me? This is very important for my bachelor thesis

Here are the pictures of the training setup:

Bild1

Hi @Patricksch - maybe a bit late to the party - but we’ve upped the memory limits for all jobs to be less stringent (they can go over memory limits without being killed immediately) and this should resolve all OOMKilled issues. We’re monitoring actively to see if any others happen and can tweak the limits if that’s the case.

Hi @janjongboom
No worries
I’m already done with all the tests for my bachelor
It works for me after deleting some images.
But later i used your new algorithm FOMO and it works fine and it’s very fast.

Thanks for your help :grinning: