Retraining a pretrained model - transfer learning

I have a decent keras model, trained on timeseries data, from some industrial machines accelerometer data.

Initially i would like to re-train my edge device using the pretrained model and the same sampled data, played on a speaker, and recorded on a microphone on the edge device. Later on i will use an accelerometer on the live machine.

The question is: Can i load my trained & non quantised .h5 model using keras/expert mode ?
To then freeze and retrain only the lower layers on the microphone data captured in EI.

/opprud

1 Like

Hi Morten,

This is an interesting question! We don’t currently provide a mechanism for developers to upload files into Edge Impulse, so the most viable option for loading a pre-trained model would be to in-line the H5 model into the expert mode editor in some kind of string encoding.

For example, using base64 it might look something like this:

# Decode the string
model_data_base64 = '...'
model_data = model_data_base64.encode('utf-8')
model_data = base64.decodebytes(model_data)
# Wrap in the correct class
h5_file = h5py.File(model_data)
# Load as Keras model
model = tf.keras.models.load_model(h5_file)

I haven’t run the above code so it may need some debugging, but give this a try and let me know if it works! The downside will be that the Python editor will be mostly filled with a long base64-encoded string, which isn’t a great user experience. If this seems like a helpful approach, we can think about ways to enable it that are more seamless.

Warmly,
Dan

@opprud Question here would also be whether transfer learning is actually viable on this type of data. We see it working really well on images because the lower layers of the network always embed similar things (contrast, shapes, colors) and training these layers against a well-validated and balanced dataset helps generalize the network when retraining the upper layers with a smaller dataset.

We don’t see it working very well on audio / accelerometer data - but would be interested to see what you find!

@janjongboom @dansitu:

I want to pre-train a model on my local PC with the “mozilla command dataset”. In a 2nd step, I want to use a limited amount of my voice speech commands for an additional round of training to fine tune the model to my voice and increase performance.

I think that is nice way to have smart devices with really good recognition rates, and low amount of additional training by the user.

Comment:

Your above proposal is working when I only load the weights (for the full model I get "payload to big error) - example below. You you can check at: https://studio.edgeimpulse.com/studio/22433.

#############################
#after training on local
model.save_weights('local.h5')
rd= open("local.h5",'rb').read()
rd_64 = base64.b64encode(rd)
rd_str=rd_64.decode("UTF-8")
print(rd_str) #copy this string

#############################
#at edgeimpulse on model expert mode
model_weights_str='pasted string'
model_weights_b64=model_weights_str.encode('utf-8')
model_weights=base64.b64decode(model_weights_b64)
f = open("model.h5", "wb")
f. write(model_weights)
f.close()

# model architecture
model = Sequential()
....
model.load_weights('model.h5')
...
# training part

My Problem/Question:

On the local PC: When starting the 2nd round (with my voice data), the pretrained model (with loaded weights from mozilla) has 70% accuracy at the first epoch, so the pre-training is working.

On EdgeImpulse when running the pre-trained model (loaded weights as above) with my voice data, the accuracy on the 1st epoch is bad (worst than without loading the model).

I have used your MFCC-DSP library on GitHub to generate “pretrain features” and compared them with results from EdgeImpulse - they are matching, also - of course the models are identical.

Is there anything special at “your” side - that causes to the model behaving differently - or maybe I am just trying something impossible?

1 Like

@dansitu Any idea? ^

@dansitu, @janjongboom : To clarify, I run the 1st round (with the Mozilla command dataset) of feature generation and model training on my local PC. To make sure, the model is trained with the correct features (same way as on EdgeImpulse) I downloaded a sample feature via below and compared it with my local features. They do match (with very small difference after 4th digit).

I would assume, that exactly this feature is also used to train the model on your system?

Beside, that I think transfer learning should be supported on all ways of training, I think it must be possible to reproduce the training locally, otherwise I do have a vendor lock-in, that is something larger companies may not appreciate (I know, I am working for one :-))

Some more details:

Features downloaded to compare with my features:

curl --request GET   --url https://studio.edgeimpulse.com/v1/api/22433/dsp-data/3/x/training   --header 'Accept: application/octet-stream'   --header 'x-api-key: my-key ' > x_dsp.npy

Program to generate the features from *.wav files

Program to train the model:

Hi @Christian42,

Sorry for the delay in responding! It’s cool to see your experiments with transfer learning on audio. We definitely want to support this in the long term, and I agree with you on the benefits of being able to reproduce training locally.

When you attempt transfer learning within Edge Impulse, are you training all of the model’s layers or just the top layers (e.g. the softmax layer and ones immediately before it)? I’d suggest making sure that the trainable property of all layers except the softmax are set to False; you can find some details on how to do this here:

You can then try unfreezing some of the layers prior to the softmax until you get the best result. Without freezing the bottom layers, the information your model has learned about the other keywords in the dataset during pre-training will be lost when those bottom layers’ parameters are adjusted.

Let me know how this goes, or if you’re already doing it and we need to think of something else!

Warmly,
Dan

1 Like

Hello
Can the user choose the best performing model that it will be used for the creation of the inference ,or it only is it done automatically by edge impulse ?
Thanks Dimitris

@tiriotis The ‘best performing model’ is just the model with the lowest loss based on your architecture (so it does not try new architectures). But if you want to experiment with multiple models you can create multiple models by clicking the #1 on the neural network page:

image

1 Like

Hi , when you say based on the architecture you mean one of the following that i have chose?
image

No I mean new Neural Network architectures (e.g. play around with the number of layers in your neural network block).