Job Failed for EON Tuner

When running the EON Tuner, I saw these errors in the traning output.

Worker deregistered: 6b2e2a34
New worker registered: 6b2e2a34
Assigning trial b8fc7c9e to worker: 6b2e2a34

  • Workers | Ready: 0 Busy: 3 Pending: 0
  • Trials | Pending: 24 Running: 3 Completed: 2 Failed:1 Retried: 32
  • Completed | DSP: 8 Learn: 2
  • Time | 1643887092

Trial failed: b8fc7c9e8e53ccfe6cdb6af7536c336a
Will retry trial b8fc7c9e8e53ccfe6cdb6af7536c336a failed: Block learn failed
Worker subprocess failed
Reporting trial failed: b8fc7c9e8e53ccfe6cdb6af7536c336a
ValueError: in user code:
return super(FuncGraph, self)._create_op_internal( # pylint: disable=protected-access
/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py:748 _apply_op_helper

And it end with this:

Will retry trial edf25b23719d097099793a4496c3eaf9 failed: Block learn failed
Handling stale learn block in trial: edf25b23, inactive for: 300560
Restarting worker: 855a64c00f950f11f0f60fd77d510218

  • Workers | Ready: 0 Busy: 2 Pending: 1
  • Trials | Pending: 25 Running: 2 Completed: 2 Failed:1 Retried: 34
  • Completed | DSP: 8 Learn: 2
  • Time | 1643887448

New worker registered: 855a64c0
Assigning trial 838bcc1a to worker: 855a64c0
Will retry trial 838bcc1ad3b7a45dacc86570c6d95975 failed: Block learn failed
Worker deregistered: 855a64c0
Trial failed: 838bcc1ad3b7a45dacc86570c6d95975
/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py:3528 _create_op_internal
ValueError: in user code:
/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py:748 _apply_op_helper
return super(FuncGraph, self)._create_op_internal( # pylint: disable=protected-access
raise ValueError(str(e))

Job failed (see above)

My ID is: 79256

Hello @andreas.bomholtz,

It is actually the first time I see that issue, I’ll defer to our Core Engineering team, they might have an idea of what could have happened.

Regards,

Louis

Hello @andreas.bomholtz,

@mathijs is having a look at your issue, he will probably have a fix by tomorrow. I will let you know when it is solved. Thank you for reporting it!

Regards,

Louis

Awesome, thanks. Looking forward to the fix.

Hi @andreas.bomholtz, a fix for this issue has been deployed! Apologies for the delay, resolving this issue proved to be slightly more complex than initially anticipated.

Hi @mathijs.

No worries, I know how it is like fixing sw bugs :wink:

I will try the EON Tuner again…

I am continuing to see this issue as well. Even with previously tuned datasets.

@mathijs

Hello @tgott,

Can you share your project ID so I can have a look?

Best,

Louis

@louis hello it I am seeing it in all of my projects. here are 2 i am working on

https://studio.edgeimpulse.com/studio/212044
https://studio.edgeimpulse.com/studio/212729

Hello @tgott,

I’ve let our core engineering team know, there is indeed an issue that I can’t explain.
We’re coming back to you as soon as possible.

Best,

Louis

Is this a high priority item? I was hoping to use for a research project ending in two weeks.

@louis

Hello @tgott,

Every blocking issue is a high priority one.

I’m doing my best to get you a response and a fix or a workaround asap.

Best,

Louis

Hello @tgott,

I don’t have the final solution yet but I managed to get some results in a new project (I invited you in that project).
I figured if you Perform train / test split (at the bottom of the Dashboard page) and you increase the time per inference (in the eon tuner settings), it does get blocked.

Don’t know which action cleared the error but hopefully it can unblock you.

Best,

Louis

@louis I see the invite and the results … thank for the time and effort on this. It does get me unblocked for now. I have some more data to work into the project … I will see how this works. thank you.

1 Like