Slow training times for image models

Hi all,

We’re seeing a regression in training time for image models. We’ve pinpointed the release that introduced this, but are investigating what caused this :slight_smile:

Note that training models will still work, but a lot slower than normal.

We’ve identified a test, running the test suite on staging now; and hopefully live in an hour or two.

Some background: for image models we store the model after every epoch, so we can find the epoch with the lowest loss and use that model (this happens when you see ‘Saving best performing model…’ during training). The format was changed from hdf5 to tf recently, which is a lot slower to save. As this runs on every epoch we’re now spending most cycles just saving models. Reverted this back to hdf5 for image models.

This change is now on production :slight_smile: