Error while training my medical instruments object detection model

Question/Issue: Error while training my medical instruments object detection model

**Project ID:**405856

**Context/Use case:**Hello, I am trying to make a medical instruments object detection model and I receive this error when I train:
/app/run-python-with-venv.sh: line 17: 11 Killed
/app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)

I only have 114 photos uploaded right now and still have a few to upload. I tried decreasing my number of training cycles or my batch size, and the one time it worked, my precision was 0.00.
and I don’t understand how I can run Python with venv…
The time limit is also a problem… that was actually the sta.rting point, from where I started to change the batch size

I’m a student and a complete beginner but I would really love to finish this project in time. Could you please help me?

this is the training output:
Creating job… OK (ID: 20500859)

Scheduling job in cluster…
Container image pulled!
Job started
Scheduling job in cluster…
Container image pulled!
Job started
Splitting data into training and validation sets…
Splitting data into training and validation sets OK

Training model…
Training on 71 inputs, validating on 18 inputs
Attached to job 20500859…
Trained 1 batches.
Attached to job 20500859…
Trained 2 batches.

Epoch Train Validation
Loss Loss Precision Recall F1
00 3.77784 0.01864 0.00 0.00 0.00
/app/run-python-with-venv.sh: line 17: 11 Killed /app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)
Job failed (see above)

Hi @ElSiMi,

I just replicated your project to try it out. I was able to get ~7% accuracy when I increased the number of epochs to 100. That tells me you probably need more data and more epochs. Object detection models are notorious for needing lots of data and training time.

Okay, thank you. I wanted to try the model with data augmentation. Would it work with this little data?
And does data augmentation work even if my training failed? Should I maybe decrease the size of the photos?

I lowered my picture size (to 96x96) and trained the model but the F1 score is 0.00. Now I try to follow your data augmentation tutorial (https://www.youtube.com/watch?v=Rf7G1xsUIw8) but I got an error doing a forward pass…
the dimension output works fine: (1, 96, 96, 1)
but then, this is what appears:
ValueError Traceback (most recent call last)
in <cell line: 4>()
2
3 # Inference
----> 4 preds = model.predict(images)
5
6 # Print out predictions

Is there any way you could help me, please?
Thank you!
Jaegle Elise

When loading my model file I have this error:
WARNING:tensorflow:No training configuration found in save file, so the model was not compiled. Compile it manually.

Model: “model_1”

how can I have a training configuration or compile my model?