Replacing iPython notebook support (in favor of something better)

janjongboom · August 16, 2022, 12:12pm

Hi all,

Just a heads-up that we’re going to replace the iPython notebook export with an export that gives you a full Python based training pipeline. This will remove the need for us to have two training pipelines (and chances for pipelines to break in two places), will offer support for every model type (incl. things that we don’t do today in iPython notebooks like object detection), and will resolve dependency issues because we can ship an optional Docker container with all dependencies installed.

This is part of a bigger push to make it easier to take models out of Edge Impulse and then push them back in (e.g. if you want to run experiments in Ray, or track some hyperparameters in W&B or MLFlow).

Existing iPython notebooks will continue working as-is, you just cannot export any new ones from the Studio.

Let me know if you foresee any issues here, but expect this to be live somewhere before Imagine 2022.

Cheers!

Joeri · August 19, 2022, 3:23pm

@janjongboom I look forward to this new release. This release will solve my question: Weights & Biases (wandb).

- A question about the data pipeline.

Data is randomly allocated to the training and the validation set. Suppose I understand (ref: Choosing your own validation data]) correct, we can not have complete control over which data is allocated to the training set and which data is allocated to the validation set. For example, in the case of medical applications, if you have different patients, data from the same patient can be assigned to the training set and the validation set. This data split results in data leakage. Correct me if I am wrong. Given the upcoming release, do we have control of the training and validation dataset split inside the studio?

A question about the training pipeline.

I am correct that today a regression in EI studio is solved as a classification problem.

# model architecture
model = Sequential()
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(classes, name='y_pred'))

Will there be a release where you regress directly to the single value?

model = Sequential()
model.add(Dense(20, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, name='y_pred'))

As an experiment, I have changed in the studio:

model.add(Dense(classes, name='y_pred'))

to

model.add(Dense(1, name='y_pred'))

So far, I see similar results. However, I am not confident if this will be in all cases. Also, because of the following:

Y = tf.keras.utils.to_categorical(Y - 1, classes)

In the new release, do we have more control?

A question (maybe) off-topic. Can we, for example, use TensorFlow Probability inside the EI studio? I am still exploring this topic Regression - Confidence Interval, but, I think, it could be an added value to have some idea about model uncertainty. What is your opinion?

Extra question.
Will the presentations of The biggest embedded ML event of the year. become online for those who can not attend this event?

Regards,
Joeri

janjongboom · September 14, 2022, 4:32pm

Hi @Joeri, this is now live (soft launched, docs etc. will be live before Imagine) so you can give it a try. Some notes:

we can not have complete control over which data is allocated to the training set and which data is allocated to the validation set.

Yes, correct, and this doesn’t change here yet. We’re fixing this at some point in the near future through metadata attached to samples. You can then select that data with the same “patient_id” should always be in the same category.

Will there be a release where you regress directly to the single value?

Yes. It’s actually already the case that classes is hardcoded to 1 here. (Note that we’re coming out with some exciting announcements around classical ML during Imagine too which you might find interesting.).

A question (maybe) off-topic. Can we, for example, use TensorFlow Probability inside the EI studio? I am still exploring this topic Regression - Confidence Interval, but, I think, it could be an added value to have some idea about model uncertainty. What is your opinion?

So this doesn’t change how classification works in the Studio, but @sara.olsson.ei is working on some flexibility in addressing regression accuracy.

Will the presentations of The biggest embedded ML event of the year. become online for those who can not attend this event?

Yes, complete event will be livestreamed and available on YT afterwards.

desnug1 · September 15, 2022, 5:27am

hi @janjongboom
I have tried for my project (ID: 137736), but got so many errors:
I tested on a windows machine.

janjongboom · September 15, 2022, 6:53am

@desnug1 as stated in the readme that’s included with the export we only support training via Docker. Should have windows commands in there too!

Joeri · September 16, 2022, 3:11pm

@janjongboom, thanks for the update.

I am improving my model (data filtering, hyperparameter tuning, …) and performing cross-validation to get an idea of the model performance. Here, I perform a train-validation split based on patient ID. (Because I have limited data, the optimization is challenging.)

Because today it is not possible to upload a pre-trained model to EI-studio, I need to retrain my model in EI studio. Is it possible to train the model using the complete training set, i.e. don’t perform any train-validation split (set the validation set to 0 or very small %)? In this way, I use the complete training dataset for Training. There is no need for validation because I already have my sequential model.
Finally, I will

test the model using an unseen testing dataset (never used during optimization) from three different patient IDs.
collect new data and test the model using this newly collected data. This will be the ultimate check to see if the model generalizes well.

I have in total 3599 inputs. What I do is

set the Validation set size = 0% => I get Training on 2879 inputs, validating on 720 inputs (80 - 20 split)
set the Validation set size, for example, 5 % => I get Training on 2879 inputs, validating on 720 inputs
set the Validation set size to 30% => I get Training on 2879 inputs, validating on 720 inputs.
I notice that: the train validation split seems forced to the default value (= 20%).

@sara.olsson.ei keep me posted about the regression accuracy.

janjongboom · September 17, 2022, 10:21am

@Joeri So I just tested the train/test split and I get this:

So that seems to work?

Joeri · September 17, 2022, 2:54pm

@janjongboom I think there is a “bug” Validation size split in expert mode.

I have a small test project (Project ID=136671).

To reproduce the “bug”:

Start in the visual simple mode and set the Validation size split = 1%
Start a training (in simple mode) => Training on 3563 inputs, validating on 36 inputs => a 1% split
Switch to expert mode => Validation size split is still 1% => Start training => Training on 3563 inputs, validating on 36 inputs => 1% split
Next change the Validation size split to 20%, you are still In the expert mode, start training => Training on 3563 inputs, validating on 36 inputs => it is still 1%!
Switch to visual simple mode. Here the Validation size split is still 1% (GUI textbox), however, you change it to 20% in the expert mode. (GUI issue?)
Change, in the simple mode, the split to 20%.
Go to expert mode and start training => Training on 2879 inputs, validating on 720 inputs => 20%.
Change Validation size split to 1% (expert mode) and start training => Training on 2879 inputs, validating on 720 inputs => it is still 20%.
Go to simple mode, change the Validation size split to 1% and I go back to expert mode and start training => Training on 3563 inputs, validating on 36 inputs => 1% split

I have the impression that there is only a validation split “update” in the simple mode and not in the expert mode.

A GUI textbox issue in expert mode?

Regards,
Joeri

janjongboom · September 19, 2022, 8:57am

@Joeri Yep, I see it, will push a fix. Will go live later this week.