Hey guys, thanks for the input.
Maybe the simplest way to say it is you may want to control the validation split for the same reasons you would want to control the test split.
The random split i think is fine if your data is exactly balanced in terms of the target classes and the characteristics of the data. But often data is coming from several distributions with varying characteristics (e.g. different test subjects, test environments, noise, sensor mounting, etc), or there is a high degree of class imbalance, so that striating the data (i.e. separating your data into subgroups before splitting) may be necessary to get a ‘fair’ evaluation.
One simple example where you would probably want this is to make sure that all ‘subwindows’ of a sample (i.e. the sliding window applied by edge impulse) belong to the same fold.
In a more complicated example, you may have a limited homebrew dataset that is high quality but you want to leverage some publicly available datasets to make your model more robust. In this case it may be helpful to control your validation data s.t. it matches more closely with your test data, and include less trusted data, or synthetic data, only in the training set (and verify it actually helps model performance). If not you could end up in the situation where you get high validation performance (on account of the more plentiful mismatched data) but low test performance, and if you’re constantly looking at your test set to tweak model performance than it kind of defeats the purpose of having a test set
I think stratifying by class label would help with class imbalance, but being able to stratify by metadata tags would be more powerful.
Again, a hacky way to achieve this would be to just have a reliable method to map from the rows in the X/Y tensors to the sample names (not sure if this can be done currently) - that way I could just encode simple metadata into the names of the samples when I upload them, and decode them in the keras expert mode to decide the validation split.