ENOSPC errors (resolved)

janjongboom · February 27, 2026, 4:13am

Hi, we’re investigating ENOSPC errors in jobs at the moment. Something is quickly eating up our network storage…

janjongboom · February 27, 2026, 4:47am

I’ve increased our deployed network storage as a temporary measure, this should now be resolved (emailing affected users).

janjongboom · February 27, 2026, 8:10am

Unfortunately this issue is back… investigating.

janjongboom · February 27, 2026, 9:22am

Issue has been properly resolved. Apologies for the inconvenience… Will send another batch of emails later

janjongboom · February 27, 2026, 10:28am

As some background: we have network storage provisioned which is used in jobs (e.g. to store features, or to store models, etc.). If data is not used in >3 days → moved to S3 (and then we load it in again when you need it). Because we saw a surge in usage over the past days w/ some very large jobs being ran we exhausted the ‘hot’ network storage. When we provisioned some extra storage earlier, this created new nodes and did not move older files off the old nodes - leading to some sporadic ENOSPC errors. We’ve cleaned this up manually now.