Question/Issue:
After completing the model training in just over 2-days the conversion process from float32 to int8 ran out of memory and crashed.
Project ID:
676816
Context/Use case:
I trained my model and it crashed after training, but before converting it to int8 data format.
Summary:
[Provide a concise summary of the bug]
Steps to Reproduce:
Click the Save and Train button on the Object Detection page.
Expected Results:
I expected to have a deployable model.
Actual Results:
I have nothing, but an error that says increase the memory used, but I don’t see where I can control that.
Reproducibility:
I don’t know, yet I’ll let you know in 2-days when it completes or fails again.
Environment:
- Platform: [e.g., Raspberry Pi, nRF9160 DK, etc.]
- Build Environment Details: [e.g., Arduino IDE 1.8.19 ESP32 Core for Arduino 2.0.4]
- OS Version: [e.g., Ubuntu 20.04, Windows 10]
-
Edge Impulse Version (Firmware): [e.g., 1.2.3]
To find out Edge Impulse Version: - if you have pre-compiled firmware: run edge-impulse-run-impulse --raw and type AT+INFO. Look for Edge Impulse version in the output.
- if you have a library deployment: inside the unarchived deployment, open model-parameters/model_metadata.h and look for EI_STUDIO_VERSION_MAJOR, EI_STUDIO_VERSION_MINOR, EI_STUDIO_VERSION_PATCH
- Edge Impulse CLI Version: [e.g., 1.5.0]
- Project Version: [e.g., 1.0.0]
-
Custom Blocks / Impulse Configuration: [Describe custom blocks used or impulse configuration]
Logs/Attachments:
[Include any logs or screenshots that may help in diagnosing the issue]
Logs/Attachments:
Epoch Train Validation
Loss Loss Precision Recall F1
59 0.00707 0.00856 0.91 0.74 0.82
Loss Loss Precision Recall F1
59 0.00707 0.00856 0.91 0.74 0.82
Finished training
Finished training
Finished training
Finished training
Converting TensorFlow Lite float32 model…
Converting TensorFlow Lite int8 quantized model…
Loading data for profiling…
Loading data for profiling OK
Loading data for profiling OK
Loading data for profiling OK
Calculating performance metrics…
Calculating inferencing time…
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Calculating performance metrics…
Calculating inferencing time…
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Calculating inferencing time OK
Calculating inferencing time OK
Calculating float32 accuracy…
Calculating float32 accuracy…
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Calculating inferencing time OK
Calculating float32 accuracy…
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Profiling 20% done
Profiling 43% done
Profiling 68% done
Profiling 89% done
/app/run-python-with-venv.sh: line 17: 24531 Killed /app/$VENV_NAME/.venv/bin/python3 -u $ARGS
Application exited with code 137 (OOMKilled)
025-05-04T07:52:53.866600658Z Train job failed: out of memory.
025-05-04T07:52:53.866625930Z Please contact support if this issue persists or increase train job memory in the project dashboard.
025-05-04T07:52:53.866625930Z Please contact support if this issue persists or increase train job memory in the project dashboard.
2025-05-04T07:52:53.897Z level=error logger=server msg=“Failed job execution”
Error: Job 32325144 finished
at /home/node/studio/build/server/server/start-daemon.js:290:48
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Error: Job 32325144 finished
at /home/node/studio/build/server/server/start-daemon.js:290:48
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2025-05-04T07:52:53.897Z level=error logger=server msg=“Failed job execution”
Error: Job 32325144 finished
at /home/node/studio/build/server/server/start-daemon.js:290:48
Application exited with code 1
Job failed (see above)
Additional Information:
[Any other information that might be relevant to the issue]