Job failed - NVIDIA TAO

Hey there,

I was trying to train TAO YOLOv4/YOLOv3 model and the job keeps failing or different reasons:

For 200 epochs

For 100 epochs:

Here’s the project link: mouse vs cup - Dashboard - Edge Impulse

Could you help me with this? TIA

Hello @karkapur,

The first error should be fixed very soon, @cward has been working on a fix.
For your second error, we are still investigating but we were not able to reproduce it yet. We’ll let you know when we found a fix.

Best,

Louis

Hey @louis

Thanks for letting me know! @cward could you update me when it’s fixed?

Cheers,
Karan

Hey @karkapur,

I will let you know once the fix it live.

Thanks
Carl

Hi @karkapur, the first error should be fixed now. The second one we are still working on as it it is quite a rare error.
Thanks
Carl

2 Likes

Thanks for the update!