Hi,
I am trying to train my model after uploading new data to it.
I’ve tried running the transfer learning phase 4 times, and I am getting the following error each time:
/app/run-python-with-venv.sh: line 21: 10 Killed PATH=$PATH:/app/$VENV_NAME/.venv/bin /app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)
Job failed (see above)
Would appreciate your help here
Thanks!
Full training output:
Splitting data into training and validation sets...
Splitting data into training and validation sets OK
Training on 1104 inputs, validating on 277 inputs
Training model...
Epoch 1/5
Epoch 20% done
Epoch 52% done
Epoch 85% done
35/35 - 43s - loss: 0.4939 - accuracy: 0.8995 - val_loss: 0.1280 - val_accuracy: 0.9747 - 43s/epoch - 1s/step
Epoch 2/5
Epoch 29% done
Attached to job 2194082...
Epoch 64% done
Epoch 97% done
35/35 - 39s - loss: 0.0138 - accuracy: 0.9973 - val_loss: 0.0431 - val_accuracy: 0.9856 - 39s/epoch - 1s/step
Epoch 3/5
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 39s - loss: 0.0287 - accuracy: 0.9946 - val_loss: 0.1016 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 4/5
Epoch 29% done
Epoch 64% done
Epoch 100% done
35/35 - 39s - loss: 0.0116 - accuracy: 0.9964 - val_loss: 0.0608 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 5/5
Epoch 29% done
Epoch 64% done
Epoch 100% done
35/35 - 39s - loss: 0.0028 - accuracy: 1.0000 - val_loss: 0.0620 - val_accuracy: 0.9819 - 39s/epoch - 1s/step
Initial training done.
Fine-tuning best model for 10 epochs...
Epoch 1/10
Epoch 20% done
Epoch 52% done
Epoch 85% done
35/35 - 42s - loss: 0.0063 - accuracy: 0.9982 - val_loss: 0.0755 - val_accuracy: 0.9783 - 42s/epoch - 1s/step
Epoch 2/10
Epoch 29% done
Epoch 64% done
Epoch 100% done
35/35 - 39s - loss: 9.9456e-04 - accuracy: 0.9991 - val_loss: 0.0749 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 3/10
Epoch 29% done
Epoch 64% done
Epoch 100% done
35/35 - 39s - loss: 6.2040e-05 - accuracy: 1.0000 - val_loss: 0.0638 - val_accuracy: 0.9819 - 39s/epoch - 1s/step
Epoch 4/10
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 39s - loss: 4.1405e-05 - accuracy: 1.0000 - val_loss: 0.0591 - val_accuracy: 0.9819 - 39s/epoch - 1s/step
Epoch 5/10
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 39s - loss: 8.1886e-05 - accuracy: 1.0000 - val_loss: 0.0612 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 6/10
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 39s - loss: 3.0306e-04 - accuracy: 1.0000 - val_loss: 0.0757 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 7/10
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 39s - loss: 1.6120e-05 - accuracy: 1.0000 - val_loss: 0.0570 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 8/10
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 39s - loss: 4.3634e-05 - accuracy: 1.0000 - val_loss: 0.0512 - val_accuracy: 0.9783 - 39s/epoch - 1s/step
Epoch 9/10
Epoch 29% done
Epoch 64% done
Epoch 100% done
35/35 - 39s - loss: 6.4518e-05 - accuracy: 1.0000 - val_loss: 0.0471 - val_accuracy: 0.9856 - 39s/epoch - 1s/step
Epoch 10/10
Epoch 29% done
Epoch 61% done
Epoch 97% done
35/35 - 40s - loss: 2.7638e-05 - accuracy: 1.0000 - val_loss: 0.0480 - val_accuracy: 0.9856 - 40s/epoch - 1s/step
Finished training
Saving best performing model...
Still saving model...
Still saving model...
Still saving model...
Still saving model...
Converting TensorFlow Lite float32 model...
Attached to job 2194082...
Converting TensorFlow Lite int8 quantized model...
Attached to job 2194082...
Attached to job 2194082...
Calculating performance metrics...
Calculating inferencing time...
Calculating inferencing time OK
Profiling float32 model...
Profiling 34% done
Profiling 68% done
Profiling float32 model (tflite)...
/app/run-python-with-venv.sh: line 21: 10 Killed PATH=$PATH:/app/$VENV_NAME/.venv/bin /app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)
Job failed (see above)