Got warnings, errors, and low accuracy when uploading a customized LeNet pytorch model to EdgeImpulse Project No.201375

Hello there, recently I’ve been trying to upload a LeNet training block based on your example-custom-ml-block-pytorch repo. After fixing various bugs, I got the model uploaded to the platform and it produces exactly the same loss values as what I got when running the model locally, but the accuracy I got from EdgeImpulse was only 11% for 10 epochs. However, when I train the model locally , it gave me an accuracy of 98% with the same epoch number and learning rate.

In addition, I found that I got this during my training on EdgeImpulse:

Creating embeddings...
WARN: Creating embeddings failed:  Default MaxPoolingOp only supports NHWC on device type CPU
	 [[node sequential/model/13/MaxPool
 (defined at /app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/util/tf_stack.py:193)
]] [Op:__inference_predict_function_1175]

Errors may have originated from an input operation.
Input Source operations connected to node sequential/model/13/MaxPool:
In[0] sequential/model/12/Relu:

Operation defined at: (most recent call last)
>>>   File "/home/profile.py", line 330, in <module>
>>>     main_function()

My questions are listed below:

    1. Is the low accuracy on EdgeImpulse related to the warning and error above?
    1. How can I avoid this error? For convenience, I pasted my model code and the complete output from EdgeImpulse training console below. In the first line of forward() function, I have converted input from NWHC format to NCWH to make pytorch functions work. I think here is where the problem located.
    1. If it is not, how can I increase the accuracy? A comparison of 11% and 98% is really suprsing.

Looking forward to your help!

Model code

X_train = np.load(os.path.join(args.data_directory, 'X_split_train.npy'), mmap_mode='r')
Y_train = np.load(os.path.join(args.data_directory, 'Y_split_train.npy'))
X_test = np.load(os.path.join(args.data_directory, 'X_split_test.npy'), mmap_mode='r')
Y_test = np.load(os.path.join(args.data_directory, 'Y_split_test.npy'))

classes = Y_train.shape[1]

MODEL_INPUT_SHAPE = X_train.shape[1:]

# <<MODIFIED>>
class Model(Module):
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(256, 120)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(120, 84)
        self.relu4 = nn.ReLU()
        self.fc3 = nn.Linear(84, 10)
        self.relu5 = nn.ReLU()

    def forward(self, x):
        # convert NWHC to NCHW to make pytorch work
        x = x.permute(0,3,1,2)
        y = self.conv1(x)
        y = self.relu1(y)
        y = self.pool1(y)
        y = self.conv2(y)
        y = self.relu2(y)
        y = self.pool2(y)
        y = y.view(y.shape[0], -1)
        y = self.fc1(y)
        y = self.relu3(y)
        y = self.fc2(y)
        y = self.relu4(y)
        y = self.fc3(y)
        y = self.relu5(y)
        return y

Complete output from EdgeImpulse training console:

Creating job... OK (ID: 7631702)

Scheduling job in cluster...
Job started
Scheduling job in cluster...
Container image pulled!
Job started
Splitting data into training and validation sets...
Splitting data into training and validation sets OK
Scheduling job in cluster...
Container image pulled!
Job started
2023-04-03 22:28:15.852751: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-03 22:28:15.852783: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
train.py:94: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:189.)
  X_train = torch.FloatTensor(X_train)
Attached to job 7631702...
Epoch 1: loss: 2.233
Attached to job 7631702...
Epoch 2: loss: 0.887
Attached to job 7631702...
Epoch 3: loss: 0.612
Attached to job 7631702...
Epoch 4: loss: 0.564
Attached to job 7631702...
Epoch 5: loss: 0.538
Attached to job 7631702...
Epoch 6: loss: 0.524
Attached to job 7631702...
Epoch 7: loss: 0.514
Attached to job 7631702...
Epoch 8: loss: 0.502
Attached to job 7631702...
Epoch 9: loss: 0.306
Attached to job 7631702...
Attached to job 7631702...
Epoch 10: loss: 0.285
Attached to job 7631702...
Attached to job 7631702...
Epoch 11: loss: 0.276
Attached to job 7631702...
Epoch 12: loss: 0.270
Attached to job 7631702...
Attached to job 7631702...
Epoch 13: loss: 0.266
Attached to job 7631702...
Attached to job 7631702...
Epoch 14: loss: 0.262
Attached to job 7631702...
Attached to job 7631702...
Epoch 15: loss: 0.259

Test accuracy: 0.886333

Training network OK

INFO:pytorch2keras:Converter is called.
WARNING:pytorch2keras:Name policy isn't supported now.
WARNING:pytorch2keras:Custom shapes isn't supported now.
DEBUG:pytorch2keras:Input_names:
DEBUG:pytorch2keras:['input_0']
DEBUG:pytorch2keras:Output_names:
DEBUG:pytorch2keras:['output_0']
graph(%input_0 : Float(1, 1, 28, 28, strides=[784, 784, 28, 1], requires_grad=0, device=cpu),
      %conv1.weight : Float(6, 1, 5, 5, strides=[25, 25, 5, 1], requires_grad=1, device=cpu),
      %conv1.bias : Float(6, strides=[1], requires_grad=1, device=cpu),
      %conv2.weight : Float(16, 6, 5, 5, strides=[150, 25, 5, 1], requires_grad=1, device=cpu),
      %conv2.bias : Float(16, strides=[1], requires_grad=1, device=cpu),
      %fc1.weight : Float(120, 256, strides=[256, 1], requires_grad=1, device=cpu),
      %fc1.bias : Float(120, strides=[1], requires_grad=1, device=cpu),
      %fc2.weight : Float(84, 120, strides=[120, 1], requires_grad=1, device=cpu),
      %fc2.bias : Float(84, strides=[1], requires_grad=1, device=cpu),
      %fc3.weight : Float(10, 84, strides=[84, 1], requires_grad=1, device=cpu),
      %fc3.bias : Float(10, strides=[1], requires_grad=1, device=cpu)):
  %11 : Float(1, 6, 24, 24, strides=[3456, 576, 24, 1], requires_grad=1, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[0, 0, 0, 0], strides=[1, 1]](%input_0, %conv1.weight, %conv1.bias) # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py:442:0
  %12 : Float(1, 6, 24, 24, strides=[3456, 576, 24, 1], requires_grad=1, device=cpu) = onnx::Relu(%11) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1299:0
  %13 : Float(1, 6, 12, 12, strides=[864, 144, 12, 1], requires_grad=1, device=cpu) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%12) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:719:0
  %14 : Float(1, 16, 8, 8, strides=[1024, 64, 8, 1], requires_grad=1, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[0, 0, 0, 0], strides=[1, 1]](%13, %conv2.weight, %conv2.bias) # /usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py:442:0
  %15 : Float(1, 16, 8, 8, strides=[1024, 64, 8, 1], requires_grad=1, device=cpu) = onnx::Relu(%14) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1299:0
  %16 : Float(1, 16, 4, 4, strides=[256, 16, 4, 1], requires_grad=1, device=cpu) = onnx::MaxPool[kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%15) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:719:0
  %17 : Long(requires_grad=0, device=cpu) = onnx::Constant[value={1}]() # train.py:77:0
  %18 : Long(requires_grad=0, device=cpu) = onnx::Constant[value={-1}]()
  %19 : Long(1, strides=[1], device=cpu) = onnx::Unsqueeze[axes=[0]](%17)
  %20 : Long(1, strides=[1], device=cpu) = onnx::Unsqueeze[axes=[0]](%18)
  %21 : Long(2, strides=[1], device=cpu) = onnx::Concat[axis=0](%19, %20)
  %22 : Float(1, 256, strides=[256, 1], requires_grad=1, device=cpu) = onnx::Reshape(%16, %21) # train.py:77:0
  %23 : Float(1, 120, strides=[120, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1](%22, %fc1.weight, %fc1.bias) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1848:0
  %24 : Float(1, 120, strides=[120, 1], requires_grad=1, device=cpu) = onnx::Relu(%23) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1299:0
  %25 : Float(1, 84, strides=[84, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1](%24, %fc2.weight, %fc2.bias) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1848:0
  %26 : Float(1, 84, strides=[84, 1], requires_grad=1, device=cpu) = onnx::Relu(%25) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1299:0
  %27 : Float(1, 10, strides=[10, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1](%26, %fc3.weight, %fc3.bias) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1848:0
  %output_0 : Float(1, 10, strides=[10, 1], requires_grad=1, device=cpu) = onnx::Relu(%27) # /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:1299:0
  return (%output_0)

INFO:onnx2keras:Converter is called.
DEBUG:onnx2keras:List input shapes:
DEBUG:onnx2keras:[(1, 28, 28)]
DEBUG:onnx2keras:List inputs:
DEBUG:onnx2keras:Input 0 -> input_0.
DEBUG:onnx2keras:List outputs:
DEBUG:onnx2keras:Output 0 -> output_0.
DEBUG:onnx2keras:Gathering weights to dictionary.
DEBUG:onnx2keras:Found weight conv1.weight with shape (6, 1, 5, 5).
.......

DEBUG:onnx2keras:...
DEBUG:onnx2keras:Check if all inputs are available:
DEBUG:onnx2keras:Check input 0 (name 27).
DEBUG:onnx2keras:... found all, continue
DEBUG:onnx2keras:Output TF Layer -> KerasTensor(type_spec=TensorSpec(shape=(None, 10), dtype=tf.float32, name=None), name='output_0/Relu:0', description="created by layer 'output_0'")
Saving saved model...
2023-04-03 22:42:50.550396: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Saving saved model OK

Converting TensorFlow Lite float32 model...
2023-04-03 22:42:52.651163: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format.
2023-04-03 22:42:52.651197: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency.
2023-04-03 22:42:52.652091: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmpkcs0pxcj
2023-04-03 22:42:52.654839: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2023-04-03 22:42:52.654871: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /tmp/tmpkcs0pxcj
2023-04-03 22:42:52.664761: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle.
2023-04-03 22:42:52.715991: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /tmp/tmpkcs0pxcj
2023-04-03 22:42:52.730687: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 78599 microseconds.
2023-04-03 22:42:52.752737: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:237] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
Estimated count of arithmetic ops: 0.572 M  ops, equivalently 0.286 M  MACs
2023-04-03 22:42:52.797172: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 0.572 M  ops, equivalently 0.286 M  MACs

Converting TensorFlow Lite float32 model OK

Converting TensorFlow Lite int8 model...
2023-04-03 22:42:53.983976: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format.
2023-04-03 22:42:53.984007: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency.
2023-04-03 22:42:53.984233: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/tmpqygecbwg
2023-04-03 22:42:53.986391: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2023-04-03 22:42:53.986423: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /tmp/tmpqygecbwg
2023-04-03 22:42:53.998176: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle.
2023-04-03 22:42:54.024644: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /tmp/tmpqygecbwg
2023-04-03 22:42:54.037518: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 53286 microseconds.
2023-04-03 22:42:54.102720: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 0.572 M  ops, equivalently 0.286 M  MACs

Estimated count of arithmetic ops: 0.572 M  ops, equivalently 0.286 M  MACs
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9
2023-04-03 22:43:01.912418: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1962] Estimated count of arithmetic ops: 0.572 M  ops, equivalently 0.286 M  MACs

Estimated count of arithmetic ops: 0.572 M  ops, equivalently 0.286 M  MACs
Converting TensorFlow Lite int8 model OK

Profiling model...
Scheduling job in cluster...
Container image pulled!
Job started
Loading data for profiling...
Loading data for profiling OK

Creating embeddings...
WARN: Creating embeddings failed:  Default MaxPoolingOp only supports NHWC on device type CPU
	 [[node sequential/model/13/MaxPool
 (defined at /app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/util/tf_stack.py:193)
]] [Op:__inference_predict_function_1175]

Errors may have originated from an input operation.
Input Source operations connected to node sequential/model/13/MaxPool:
In[0] sequential/model/12/Relu:

Operation defined at: (most recent call last)
>>>   File "/home/profile.py", line 330, in <module>
>>>     main_function()
>>> 
>>>   File "/home/profile.py", line 140, in main_function
>>>     ei_tensorflow.embeddings.create_embeddings(
>>> 
>>>   File "/app/./resources/libraries/ei_tensorflow/embeddings.py", line 36, in create_embeddings
>>>     X_pred = pred_from_savedmodel(model, SHAPE, rows, x_file)
>>> 
>>>   File "/app/./resources/libraries/ei_tensorflow/embeddings.py", line 70, in pred_from_savedmodel
>>>     embeddings_len = model.predict(X_0).shape[1]
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 1789, in predict
>>>     tmp_batch_outputs = self.predict_function(iterator)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 910, in __call__
>>>     result = self._call(*args, **kwds)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 958, in _call
>>>     self._initialize(args, kwds, add_initializers_to=initializers)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in _initialize
>>>     self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3157, in _get_concrete_function_internal_garbage_collected
>>>     graph_function, _ = self._maybe_define_function(args, kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3557, in _maybe_define_function
>>>     graph_function = self._create_graph_function(args, kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3392, in _create_graph_function
>>>     func_graph_module.func_graph_from_py_func(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1143, in func_graph_from_py_func
>>>     func_outputs = python_func(*func_args, **func_kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 672, in wrapped_fn
>>>     out = weak_wrapped_fn().__wrapped__(*args, **kwds)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 1118, in autograph_handler
>>>     return autograph.converted_call(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 1621, in predict_function
>>>     return step_function(self, iterator)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 1611, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1316, in run
>>>     return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 2892, in call_for_each_replica
>>>     return self._call_for_each_replica(fn, args, kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py", line 3695, in _call_for_each_replica
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 1604, in run_step
>>>     outputs = model.predict_step(data)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/training.py", line 1572, in predict_step
>>>     return self(x, training=False)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/sequential.py", line 373, in call
>>>     return super(Sequential, self).call(inputs, training=training, mask=mask)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/functional.py", line 451, in call
>>>     return self._run_internal_graph(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
>>>     outputs = node.layer(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/functional.py", line 451, in call
>>>     return self._run_internal_graph(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
>>>     outputs = node.layer(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/keras/layers/pooling.py", line 357, in call
>>>     outputs = self.pool_function(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 1096, in op_dispatch_handler
>>>     return dispatch_target(*args, **kwargs)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/ops/nn_ops.py", line 4865, in max_pool
>>>     return gen_nn_ops.max_pool(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 5387, in max_pool
>>>     _, _, _op, _outputs = _op_def_library._apply_op_helper(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 744, in _apply_op_helper
>>>     op = g._create_op_internal(op_type_name, inputs, dtypes=None,
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 689, in _create_op_internal
>>>     return super(FuncGraph, self)._create_op_internal(  # pylint: disable=protected-access
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3697, in _create_op_internal
>>>     ret = Operation(
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 2101, in __init__
>>>     self._traceback = tf_stack.extract_stack_for_node(self._c_op)
>>> 
>>>   File "/app/keras/.venv/lib/python3.8/site-packages/tensorflow/python/util/tf_stack.py", line 193, in extract_stack_for_node
>>>     return _tf_stack.extract_stack_for_node(
>>> 

Calculating performance metrics...
Profiling float32 model...
Profiling int8 model...
Profiling 54% done

Model training complete

Job completed

@xinyew congrats on your first post and thank you for using Edge Impulse.

We’ve just landed a new feature. You can import your pretrained model (BYOM). See this post here:

Regarding your learn block the warning about the shape is critical. Edge Impulse requires the input to be in NHWC, RGB normalized with mean 0 and std=255. See our docs on for mor information and examples for vision models.

I’d like to hear more about the changes you had to make to example-custom-ml-block-pytorch.

PyTorch BYOA issue

Hello @rjames , thanks for your reply. I’ve temporarily given up the PyTorch based on my oldest post since I still could not fix the issue. I put a permute() function in my forward method to transform channel last to channel first, but it still gave me the same error.

TensorFlow/keras BYOA issue

After this failure, I also tried using your https://github.com/edgeimpulse/example-custom-ml-block-keras with TensorFlow and Keras, which gave me promising accuracy locally as PyTorch. But when I tried to run the learning block online on EdgeImpulse Studio, I got the output below:

Creating job... OK (ID: 8241105)

Scheduling job in cluster...
Job started
Scheduling job in cluster...
Job started
Splitting data into training and validation sets...
Attached to job 8241105...
Splitting data into training and validation sets OK
Scheduling job in cluster...
Pulling container image...
.............
Pulling container image...
Container image pulled!
Job started
2023-04-21 04:35:26.467154: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-04-21 04:35:26.467184: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "train.py", line 4, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/__init__.py", line 41, in <module>
    from tensorflow.python.eager import context
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/context.py", line 33, in <module>
    from tensorflow.core.framework import function_pb2
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/core/framework/function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/core/framework/resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/core/framework/tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "/usr/local/lib/python3.8/dist-packages/google/protobuf/descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Application exited with code 1

Job failed (see above)

I don’t think this is something caused by my modification because the only part that I changed is the model architecture.

The new BYOM issue

Now I have to rely on the new BYOM feature to generate deployable code. It works but it seems that I cannot turn on EON compiler with BYOM.

My questions:

  • BYOA with PyTorch will be super helpful for us researchers to build tinyML model workflows efficiently. I’m wondering if you could give me some suggestions about how to solve the issue. I don’t understand why the permute() method does not work well.
  • BYOA with TensorFlow seems not to be working at all. I think there might be some dependency issues with your code or my code. If we can get BYOA working with TensorFlow, it will also be super appreciable.
  • Is there any possibility that we can create deployment code with EON compiler on when we are using BYOM functionality?

Looking forward to hearing from you.

@xinyew I’ve fixed the Keras block. Because no-one cares to pin their dependencies in downstream Python packages it’s an ever moving target to get a set of packages that actually works (now due to a protobuf update). We had this fixed internally but hadn’t updated the public repos - but now we have.

I’ve asked the ML team to see if they know about the PyTorch issue.

  • Is there any possibility that we can create deployment code with EON compiler on when we are using BYOM functionality?

It is, just only for enterprise customers at the moment :slight_smile:

1 Like

Hi there!

First up, thank you so much for using our product; it’s always exciting to see researchers deploying models to the edge.

Our deployment system automates the conversion of NCWH to NWHC in the model’s graph, so it’s totally fine for you to train in NCWH—no need to add the permute(0,3,1,2) to your model.

Our numpy files provide the data in NWHC. To convert it to NCWH for training you can just do the following after loading the data:

X_train = X_train.reshape((-1,1,32,32))
X_test = X_test.reshape((-1,1,32,32))

Hopefully this solves your problem, but let me know if we can help with anything else!

As @janjongboom mentions, BYOM for EON Compiler is only available for enterprise customers, but you should be able to deploy via TensorFlow Lite for Microcontrollers.

Warmly,
Dan

1 Like