Training Failed

nwaso886 · April 9, 2023, 7:13am

In the learning block, I selected classification. In this section, I don’t know why I could not train my model.
What is the problem?

nwaso886 · April 9, 2023, 7:14am

Creating job… OK (ID: 7785258)

Scheduling job in cluster…
Job started
Scheduling job in cluster…
Container image pulled!
Job started
Splitting data into training and validation sets…
Splitting data into training and validation sets OK

Training model…
Training on 308 inputs, validating on 78 inputs
Traceback (most recent call last):
File “/home/train.py”, line 262, in
main_function()
File “/home/train.py”, line 190, in main_function
model, override_mode, disable_per_channel_quantization, akida_model, akida_edge_model = train_model(train_dataset, validation_dataset,
File “/home/train.py”, line 135, in train_model
model.fit(train_dataset, epochs=EPOCHS, validation_data=validation_dataset, verbose=2, callbacks=callbacks, class_weight=ei_tensorflow.training.get_class_weights(Y_train))
File “/app/keras/.venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py”, line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File “/app/keras/.venv/lib/python3.8/site-packages/keras/engine/data_adapter.py”, line 1426, in _make_class_weight_map_fn
raise ValueError(error_msg)
ValueError: Expected class_weight to be a dict with keys from 0 to one less than the number of classes, found {3: 0.3117408906882591, 2: 12.833333333333334, 1: 1.4528301886792452, 4: 38.5}
Application exited with code 1

shawn_edgeimpulse · April 11, 2023, 4:54pm

Hi @nwaso886,

When I checked your project, it looks like you were able to get the model to train. Is it working for you now?

drewmagic · March 13, 2025, 3:45am

Hi,
I started my first project( 643751) as a Object Detection model and it was getting good results. I then removed the object recognition module and replace it with “Transfer learning” as I saw that in the tutorial and classification is actually what I’m after. I re-labeled all my images to have one label per rather than frames (after getting a helpful learning error that told me the labels were wrong).

I now get this error with training. I understand that I have a very small population per classification to train on and will acquire more images but I don’t think that’s the specific issue.

Any tips on fault-finding for this error?

drewmagic · March 14, 2025, 1:45am

I added more data to ensure there were at least 2 training, 1 validation and 1 testing image. This didn’t fix the issues.

However, I then changed the engine for the model. I think it was still using a object detection method rather than a classification method. Changing this enabled me to train the model.

When i change the project from object detection to classification, it didn’t update the AI structure and I think an incompatible component was still in the transfer block.

I hope this helps others :). I’m back on track now - just need to create a LOT more training images.