Application exited with code 2 (OOMKilled) issue

Hello all!

I have a new shiny issue I’ve encountered :smiley:

I’ve been retraining my model with some of the data my camera trap is giving me, and everything has been going smoothly in downloading the latest modelfile.eim to my Raspberry Pi 4. But tonight when I tried to download the latest modelfile.eim to my device, using the command “sudo edge-impulse-linux-runner --download modelfile.eim” I’m getting the following output:

Edge Impulse Linux runner v1.2.6

[RUN] Downloading model...
[BLD] Created build job with ID 1035434
[BLD] Writing templates OK
[BLD] Scheduling job in cluster...
[BLD] Exporting TensorFlow Lite model...
[BLD] Job started
[BLD] Exporting TensorFlow Lite model OK
[BLD]
[BLD] Removing clutter...
[BLD] Removing clutter OK
[BLD]
[BLD] Copying output...
[BLD] Copying output OK
[BLD]
[BLD] Job started
[BLD] Building binary...
[BLD] arm-linux-gnueabihf-g++ -MD -Wall -g -Wno-strict-aliasing -I. -Isource -Imodel-parameters -Itflite-model -Ithird_party/ -Os -DNDEBUG -g -DEI_CLASSIFIER_USE_FULL_TFLITE=1 -Iedge-impulse-sdk/tensorflow-lite -std=c++14 -c source/main.cpp -o source/main.o
[BLD] arm-linux-gnueabihf-g++ -MD -Wall -g -Wno-strict-aliasing -I. -Isource -Imodel-parameters -Itflite-model -Ithird_party/ -Os -DNDEBUG -g -DEI_CLASSIFIER_USE_FULL_TFLITE=1 -Iedge-impulse-sdk/tensorflow-lite -std=c++14 -c tflite-model/tflite-trained.cpp -o tflite-model/tflite-trained.o
[BLD] arm-linux-gnueabihf-g++: internal compiler error: Killed (program cc1plus)
[BLD] Please submit a full bug report,
[BLD] with preprocessed source if appropriate.
[BLD] See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
[BLD] make: *** [source/main.o] Error 4
[BLD] Makefile:67: recipe for target 'source/main.o' failed
[BLD] Application exited with code 2 (OOMKilled)
[RUN] Failed to run impulse Failed to build binary

Can anyone help me with this please? I’m not sure what I should be doing next, as I’m not aware that I’ve done anything differently and everything’s been working nicely so far!

Thanks for any help in advance, I’m having great success using Edge Impulse otherwise :slight_smile:

Hi Tom,

Could you you share your project ID?

Thanks,
Aurelien

@TechDevTom Apologies! We restricted memory limits for some deployment targets, and this has broken object detection on some Linux targets. We’re reverting this now, and should be fixed within an hour or so.

1 Like

@aurel seems like @janjongboom is on the case!

Could I ask why this was done? Are some projects taking up too much memory, and if so, is there anything we as the people using Edge Impulse can do to lower our memory usage? Is it in relation to the amount of data we’re processing for our models?

Hi @TechDevTom this is now released.

No, definitely not on the users side, and no reason to make smaller projects - we wanted to make the resource allocation more explicit in our code base, and by accident halved the memory we allocated for deployment blocks. :slight_smile:

Cheers, fixed on my end, hoorah! Back to gathering data and hoping my model now does not recognise plant pots and plants as animals/birds.

Ah I see, these things happen, I know all too well :sweat_smile:

1 Like

Hey @janjongboom, sorry to necro an older thread, but I’m having an issue where I’m getting another OOMKilled message when retraining my object detection model:

Application exited with code 137 (OOMKilled)

Is this a memory issue again, or should I open up a new thread and ask for help?

Hi @TechDevTom,

Could you give it another try? I enabled our enterprise performance feature as you have a large dataset.
Let us know if that helps,

Aurelien

@aurel I’ve just set it off now and will see how it goes.

Regarding the enterprise performance feature, is that something that I’ll need to pay for? What are the limitations dataset/image count wise with normal performance vs enterprise performance?

@aurel Sorry but no luck! Same error, just after it starts on the first epoch.

Hey @aurel, sorry to bother you again, but I’m not having any success in building my model still, have you got any suggestions that might help?

Hi @TechDevTom,

Sorry for the late reply. This is actually related to a Tensorflow memory leak (https://github.com/tensorflow/models/issues/9981), we are following up with the team. In the meantime the best solution is to reduce the size of the dataset, 100 images should already work well.
We’ll keep you posted.

Aurelien

1 Like

Hey @aurel, no worries. And I see, well, I guess I’ll just wait until it’s fixed. I’m afraid for my application 100 images wasn’t working so well, but I may take what I’ve learned and create a new project to see if some of the newer images I’m taking can create a more efficient model.

Cheers!

1 Like

@TechDevTom - we’ve upped the memory limits for all jobs to be less stringent (they can go over memory limits without being killed immediately) and this should resolve all OOMKilled issues. We’re monitoring actively to see if any others happen and can tweak the limits if that’s the case.

1 Like

Awesome, have given it a blast and I’m seeing no more issues, cheers @janjongboom!

1 Like