Hi, we’re investigating ENOSPC errors in jobs at the moment. Something is quickly eating up our network storage…
I’ve increased our deployed network storage as a temporary measure, this should now be resolved (emailing affected users).
Unfortunately this issue is back… investigating.
Issue has been properly resolved. Apologies for the inconvenience… Will send another batch of emails later ![]()
As some background: we have network storage provisioned which is used in jobs (e.g. to store features, or to store models, etc.). If data is not used in >3 days → moved to S3 (and then we load it in again when you need it). Because we saw a surge in usage over the past days w/ some very large jobs being ran we exhausted the ‘hot’ network storage. When we provisioned some extra storage earlier, this created new nodes and did not move older files off the old nodes - leading to some sporadic ENOSPC errors. We’ve cleaned this up manually now.