iPython notebook - can't unpack .npy data

nebelgrau77 · November 14, 2020, 5:25pm

Hi everyone,

I’m watching a great Hackaday Remoticon TinyML talk and of course trying to replicate it/experiment myself. I noticed that there is an “edit as an iPython notebook” option, so I downloaded the notebook, and here’s the problem: when running this cell

with open('x_train.npy', 'wb') as file:
    file.write(X)
with open('y_train.npy', 'wb') as file:
    file.write(Y)
X = np.load('x_train.npy')
Y = np.load('y_train.npy')[:,0]

I’m getting a Value Error "Cannot load file containing pickled data when allow_pickle=False".
I thought it was something that had changed in the NumPy .load() function, so I tried forcing it to allow_pickle = True, but that doesn’t work, either:
OSError: Failed to interpret file 'x_train.npy' as a pickle

The problem seems to be with the X values: y_train.npy gets processed without any problem. The y_train.npy file is almost 12 megabytes, meanwhile the x_train.npy only 158 kilobytes, which makes me wonder if it gets created/downloaded properly.

Any idea what it could be?

ShawnHymel · November 15, 2020, 3:54pm

Hi @nebelgrau77, I’m the person that gave that workshop on Hackaday (I’m glad you’re enjoying it!). Can you link to where you got that Notebook file? I don’t recognize that code as coming from one of the workshop’s scripts.

The GitHub repo that we used in the workshop can be found here: https://github.com/ShawnHymel/ei-keyword-spotting. In it, there should be two options: one for running a Jupyter Notebook on Colab (remotely) or running the Python curation script (locally).

nebelgrau77 · November 15, 2020, 6:45pm

Hi Shawn,

It’s not from your scripts (which work great BTW!), it’s an EdgeImpulse dashboard feature. In Impulse design/NN Classifier there are three little dots at the top, with the two options: Keras (expert) mode, where you can fine tune the model’s code in Python, and edit as iPython notebook. The notebook gets exported without any problem, but there must be some problem with the .npy files generation.

janjongboom · November 15, 2020, 7:09pm

@Nebelgrau, which project is this for? I see you have multiple.

nebelgrau77 · November 15, 2020, 7:11pm

Hello Jan, it’s the speech_recog project, it’s the only one where I tried this option.

janjongboom · November 16, 2020, 7:12pm

I’ll look into it in detail tomorrow, but for now you can grab the labels NPY file from Dashboard.

nebelgrau77 · November 16, 2020, 7:55pm

I think there is just some problem with the first file: I can open three of them with a simple np.load(), but the first one keeps giving me problems, as if it was somehow malformed (I just downloaded them all from the dashboard).

Edit: Yep, confirmed: a malformed file, the X training data. But just in the speech_recog_2 project, the speech_recog and speech_recog_2_v2 are OK, the downloaded notebook unpacks the .npy files OK!

janjongboom · November 17, 2020, 7:20pm

@nebelgrau77 The plot thickens… I’ve just downloaded the X & Y.npy file from both Dashboard and from the iPython notebook export (shapes: X (4799, 637) Y (4799,4)) and this imports fine for me in Python 3.9.0. This was for project ID 11649 ( speech_recog_2).

Did you retrain this model since your last message by any chance?

nebelgrau77 · November 17, 2020, 7:39pm

Nope, but it’s working fine for me, too, just testing it as I’m typing. Mine is Python 3.6.10, miniconda installation. Must’ve been some momentary glitch, thanks for checking!

And the feature itself is great, makes it easy to add some graphs to see how the loss/val loss are
behaving in time, all the model history stuff and such.

janjongboom · November 17, 2020, 8:04pm

Hmm… If you encounter this again would you mind versioning your project (Versioning tab in the Studio), that caches all intermediate state.

And yeah, we want to add those things to the Studio at some point, but great to use it like that in the meantime!

nebelgrau77 · November 17, 2020, 8:30pm

Sure, will do! I’ll keep an eye on it