Include More & More Training Data

Request: Allow for mass disabling of data by percentage.
Motivation: Allows one to quickly iterate over Impulse properties to see how they affect the Training. Then interactively we will add in more data and the Model should improve

Example:

  • 3 Labels in dataset
  • Each Label has 1000 Samples

Option1:

  • Textbox: “Include x% of Samples for each Label”
    • x is user configurable
  • If x = 10, then the Model will be trained on 300 Samples or 100 Samples from each Label category.

Option2:

  • Textbox next to each Label: “Include x% of Label Samples”
    • x is user configurable

Hi @MMarcial

Interesting what you are describing is called subset sampling.

Guess you could do this as is by setting the train / test split to Train:10% Test:90%. Then you will have faster training on your initial to get a gauge of the size of data required, but that is just a work around.

Let’s put a feature request together to be discussed with the ML and Studio teams. If you have any more detail or references to include please do.

Best

Eoin