FOMO Training on GPU not faster than on CPU

Hi,
for my upcoming project, I plan to train a FOMO model on a huge number of 96 by 96 images (about 50k).
I tested with a small subset (about 500 images) in the Web-UI and tested both CPU and GPU training processors. GPU training did not finish noticeably faster than CPU training.
Why is that, and is the training pipeline optimized for GPUs, parallelism and larger datasets in general?

It should be reproducible by picking any example project, setting the resolution to something similar and comparing CPU and GPU training times.

I already tried exporting the block and trained with RunPod on an NVIDIA L4, getting under 2% GPU utilisation and very slow (impractical) training speeds.

Best regards,
Luis

Hi Luis,

Thank you for your bug report and for being a user of our platform. Apologies for the delay in responding—our team has fixed the underlying issue and you should now be seeing full GPU acceleration in your FOMO project.

Let me know if it’s not working as expected, or if you have any other problems!

Warmly,
Dan

  1. I remember during FOMO dev that it was very easy to saturate the model with a large dataset; keep in mind FOMO is pretty small so you can see diminishing returns surprisingly quickly. I’d definitely try some smaller samples first; e.g. 1K vs 5K before investing too far ( and do x5 1K to get some sense of variance )

  2. Another thing I tried, but didn’t bother including, is image packing. E.g. collaging x100 images into a single (960, 960) image ( recall FOMO is fully patch based ). There were some boundary issues, though wasn’t a problem, and overall made GPU util go through the roof. I just didn’t push the idea because of the above point.

So I reckon try 1. and if feel you do need the big dataset and you want to try 2. it’s something we can help you structure either as a EI data pipeline or in expert mode [2]

[1] Data pipelines - Edge Impulse Documentation
[2] Expert mode - Edge Impulse Documentation

Mat