EON turner fails after a short time

Hi guys
I am trying to extend my wakeword and include other words
stop yes no
I am trying to use the EON tuner to tune the parameters and it seems to die at the same point
https://studio.edgeimpulse.com/studio/76435/tuner

Am I doing something wrong at my end.
this is the output
Creating job… OK (ID: 2050886)

Calculating DSP performance estimates
Starting hyperparameter search…

Scheduling job in cluster…
Job started
Creating trials…
Existing input block found: 402907
Existing DSP block found: 746532
Existing learn block found: 746534
Existing input block found: 402897
Existing DSP block found: 402980
Created new learn block: 753221
Existing input block found: 402897
Existing DSP block found: 402980
Existing learn block found: 746538
Existing input block found: 402902
Existing DSP block found: 746540
Existing learn block found: 746542

Starting hyperparameter search worker0
Starting hyperparameter search worker1
Starting hyperparameter search worker2

  • Workers | Ready: 0 Busy: 0 Pending: 3
  • Trials | Pending: 4 Running: 0 Completed: 0 Failed:0 Retried: 0
  • Completed | DSP: 0 Learn: 0
  • Time | 1643077211

Existing input block found: 402902
Existing DSP block found: 402948
Created new learn block: 753223
Existing input block found: 402907
Existing DSP block found: 746546
Existing learn block found: 746547
Existing input block found: 402897
Existing DSP block found: 403002
Existing learn block found: 746549
Existing input block found: 402907
Existing DSP block found: 402940
Created new learn block: 753225
Existing input block found: 402907
Existing DSP block found: 746553
Created new learn block: 753227
Existing input block found: 402902
Existing DSP block found: 402918
Created new learn block: 753229
Existing input block found: 402897
Existing DSP block found: 402974
Existing learn block found: 746559
Existing input block found: 402902
Existing DSP block found: 402918
Created new learn block: 753231
Existing input block found: 402907
Existing DSP block found: 746563
Existing learn block found: 746565
Existing input block found: 402897
Existing DSP block found: 746567
Created new learn block: 753233
Existing input block found: 402902
Existing DSP block found: 746571
Existing learn block found: 746573
Existing input block found: 402907
Existing DSP block found: 746575
Existing learn block found: 746577
Existing input block found: 402907
Existing DSP block found: 402908
Created new learn block: 753235
Existing input block found: 402902
Existing DSP block found: 746581
Existing learn block found: 746583
Existing input block found: 402897
Existing DSP block found: 403002
Existing learn block found: 746549
Existing input block found: 402897
Existing DSP block found: 746585
Created new learn block: 753237
Existing input block found: 402902
Existing DSP block found: 403024
Existing learn block found: 746589
Existing input block found: 402902
Existing DSP block found: 402948
Created new learn block: 753239
Existing input block found: 402907
Existing DSP block found: 403036
Existing learn block found: 746593
Existing input block found: 402907
Existing DSP block found: 746595
Existing learn block found: 746597
Existing input block found: 402907
Existing DSP block found: 403048
Existing learn block found: 746599
Existing input block found: 402902
Existing DSP block found: 746601
Created new learn block: 753241
Existing input block found: 402897
Existing DSP block found: 746606
Existing learn block found: 746608
Existing input block found: 402897
Existing DSP block found: 746610
Created new learn block: 753243
Existing input block found: 402897
Existing DSP block found: 746613
Created new learn block: 753245
Existing input block found: 402902
Existing DSP block found: 746617
Existing learn block found: 746619
New worker registered: f208e88f
Assigning trial 9c288069 to worker: f208e88f

  • Workers | Ready: 0 Busy: 1 Pending: 2
  • Trials | Pending: 29 Running: 1 Completed: 0 Failed:0 Retried: 0
  • Completed | DSP: 0 Learn: 0
  • Time | 1643077226

New worker registered: 98990193
New worker registered: 1212ccf0
Assigning trial 8bc5daf8 to worker: 98990193
Copying features from processing blocks…
Copying features from DSP block…
Copying features from DSP block OK
Assigning trial a281ae64 to worker: 1212ccf0

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077242

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077258

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077273

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077289

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077305

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077320

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077336

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077352

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077368

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077383

  • Workers | Ready: 0 Busy: 3 Pending: 0

  • Trials | Pending: 27 Running: 3 Completed: 0 Failed:0 Retried: 0

  • Completed | DSP: 0 Learn: 0

  • Time | 1643077399

I’m guessing that it hasn’t totally failed as it does say Pending 27
However if I go away and come back this hasn’t changed and if I completely shut the web browser down am prompted with Start EON tuner again.
I will let it run for a couple of days and see what happens.

I have now noticed.
Handling stale learn block in trial: 409de512, inactive for: 300464
Will retry trial 409de512fa71d3eeb1ffe8d7040c965d failed: Block learn failed
Restarting worker: d40d133ecb46a804556267254a207954

Hi @greg_dickson,

I’m trying it out on my end, and it does seem quite slow. Were you able to do regular training (without the EON Tuner) previously?

Thanks Shawn (@shawn_edgeimpulse)

Yeah it worked fine with the old values from a previous EON tune and all the words enabled.

The data set is not well balanced some words having many recordings others not so.

I have set the target to an apple which is probably closer to the linux i5 nuc I am using. That is what I am trying to run the tuner on.

@shawn_edgeimpulse
I have just added new data to the model and retrained it easily on the old parameters from an earlier EON tune.
The Eon tuner still bawks though.
on these settings

Keyword spotting

MacBook Pro 16" 2020 (Intel Core i9 2.4GHz)

100 ms

4096 kB

4096 kB

Ok @shawn_edgeimpulse
I disabled a couple of words and rerun the tuner and got this error.

Trial failed: edf58defa076b9da79955c0d5ddac01a
categorical[np.arange(n), y] = 1
File “/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/utils/np_utils.py”, line 78, in to_categorical
IndexError: index 5 is out of bounds for axis 1 with size 4
Traceback (most recent call last):
File “/home/tuner/ei_tensorflow/training.py”, line 85, in split_and_shuffle_data

Ok I started a new project
https://studio.edgeimpulse.com/studio/77946
downloaded the data from this one
Physically removed the data of two words.
Uploaded to the new project and EON trainer seems to be ok.
I will try another project with all the data.

NOPE I was wrong

I started getting

Handling stale learn block in trial: 5fd4c858, inactive for: 300334
Will retry trial 5fd4c858f5d3dcf024873b6c215c813b failed: Block learn failed
Restarting worker: b6b88d966e4521538b47b9f4931a1389

I’m starting to think that perhaps the idea of enabled is not incorporated in the EON tuner properly.

Also the web idea of hitting start again to see the progress of an existing tune is not a good idea.
If the tuner is in progress the web page should have a show progress button instead.

Hi @greg_dickson,

I want to let you know that I’ve passed this information on to the dev team. I’ll give you an answer as soon as I find out anything.

thanks @shawn_edgeimpulse
my simpler model seems to be running but very slowly 4 completed in 10 hours.
As the logs are not available once you leave the page I can’t give any more feedback.
I have an old tune that I can use the values from and that is working fine so there is no rush. I just add feedback where I can.

1 Like

Ok So after lots of trails it appears to be associated with the target choice the Apple target is very slow and seems to fail a lot I will just run it and come back in a few days. However this just basically stopped the last time I tried it.