Issue with automatic labeling based on file name

Hi everyone,

I’m currently facing an issue with the automatic labeling feature. I have a CSV file that I believe follows the guidelines mentioned in the documentation for importing CSV data with <label>.<unique-id>.csv format. Here’s an example filename:

tip_act.2023-04-08_14-17-04.csv.

However, when I upload the CSV file, the automatic labeling feature doesn’t seem to be identifying tip_act as the label. Instead, it uses the first entry in the label column of the CSV. I’m not sure what I’m doing wrong, or if there’s an additional step that I need to take in order to ensure that the automatic labeling feature works properly.

Even when I select the manual Enter label method and enter for one dataset a name, the name is not used, and it uses the first entry in the label column of the CSV.
Thank you in advance!

Hi @nerubjen,

When you go through the CSV wizard (CSV Wizard - Edge Impulse Documentation), make sure to select “No” when asked if you have a column that contains the label.

Once you have complete the wizard, you can upload files with ..csv to have Studio interpret the label from the CSV name.

1 Like

But the column “label” in which I separate between noise and my labeled data is then still used for dividing my training data?

Hi @nerubjen,

I’m not sure I’m following: do you have an example of a .csv file that you could share?

Of course:

gyroX,gyroY,gyroZ,accelX,accelY,accelZ,magX,magY,magZ,label
...
295,148,-216,-73,3896,687,-289,-740,30,noise
209,100,-220,-111,3967,760,-293,-738,26,noise
144,78,-216,-146,3946,630,-294,-739,27,noise
127,36,-202,-120,3925,519,-291,-738,29,noise
133,22,-176,-120,3973,553,-291,-738,29,noise
147,0,-145,-161,3991,591,-291,-737,32,noise
158,-9,-121,-137,3988,622,-289,-738,38,noise
175,-5,-105,-147,4011,627,-289,-738,38,noise 
126,100,46,50,4079,429,-278,-736,102,tip_act
105,123,63,77,4119,483,-282,-733,101,tip_act
77,97,105,109,4097,380,-282,-733,101,tip_act
82,90,137,72,4141,388,-283,-736,102,tip_act
... more data for the tip_act block
9,-7,-12,-23,4088,265,-219,-737,195,tip_act
-56,11,17,-3,4188,355,-219,-737,195,tip_act
-104,-8,65,-32,4123,265,-221,-734,193,tip_act
-101,-5,92,-74,4066,51,-221,-730,194,tip_act
-23,-7,87,-138,3988,50,-221,-730,194,tip_act
56,-19,50,-84,4071,174,-221,-729,193,tip_act
30,24,20,-62,4167,354,-221,-734,193,noise
-47,18,37,-115,4194,445,-221,-734,193,noise
-117,-124,75,-96,4106,340,-222,-734,189,noise
-124,-103,39,-111,4017,95,-225,-736,188,noise
-84,-56,-29,-198,3990,35,-225,-736,188,noise
-28,-67,-71,-38,4083,80,-230,-733,189,noise

Hi @nerubjen,

Do you want the model to recognize the difference between “noise” and “tip_act?” If so, then you need to choose only one:

  • Label is in the filename
  • Label is a column in the data

You can’t have both. Can you tell the difference between noice and tip_act from just one reading of your IMU? Or do you need a window (e.g. several readings) to discern the difference? If you need a window, then I would recommend dividing up your samples into several windows where each window is a separate .csv file. That file has the label matching the window (e.g. tip_act.1234.csv, tip_act.1235.csv, noise.1236.csv, and so on). That will make training much easier.

Hello @shawn_edgeimpulse . Thank you very much for your response and help. As I understand it, I have been using the incorrect method to label my data.
Currently, I have two non-repetitive events that i want to check if i can recognize them. I recorded 100 samples of a pick-up event where I recorded the movement when the object was picked up, and a put-down event respectively. I automatically annotated them and stored them in a file containing both values.

However, I have followed the advice from a forum thread (Annotating time-series data with individual events: How to handle specific movements within the data?) and split the recordings into individual files, removing the timestamp and adding a “label” column to label the rest as “noise”.

Based on your response, it would be better to remove the “noise” data completely and just upload the CSVs containing the data that interests me. I want to make sure that I understand correctly, so could you please confirm if this is the correct approach? Thank you again for your support.

Hi @nerubjen,

In that post, @MMarcial is suggesting the same thing that I suggested: divide your samples into different files. For example, let’s say you make 100 recordings of the “pick up” event, 100 recordings of the “put down” event, and 100 recordings of “noise” (or background sounds). You would then create 100 files for each class:

  • 100 files with the names pick_up.001.csv, pick_up.002.csv, pick_up.003.csv, and so on
  • 100 files with the name put_down.001.csv, put_down.002.csv, put_down.003.csv, and so on
  • 100 files with the name noise.001.csv, noise.002.csv, noise.003.csv, and so on

When you up load those files to Studio, it should divide them up into their respective classes. That should give you a fairly balanced dataset among your 3 classes: pick_up, put_down, noise. Hope that helps!

Thank you for clarifying, I had misunderstood it earlier.

1 Like