we are looking at a data set of ~50.000 measurements of a metal waste sorting machine. One measurement is done using a electro magnetic sensor, which is distorted based on the type of metal passing the sensor. Each measurement generates a 40 measuring point long line. There are typically 8 different classes to be distinguished.
So we are essentially looking into a ML model which can correctly distinguish between the classes based on the 40 point measurement per item.
All these datasets are stored in CSV.
- How can I import the CSV files into the studio?
- What is the best way to see if you are able to produce a model sufficiently precise?
- The usual rule 60%(train)/20%(test)/20%(detect) is being applied automatically to the data set or do we need to split the files into the respective subsets?
This sounds like a very interesting project!
- You’ll need first to convert from csv to our json acquisition format, see some example code here: Upload dataset in CSV file to Edge Impulse. You can generate a json for each 40 measuring point long line; or aggregate multiple of them and use the window size in the studio to re-split in 40 points samples.
- As your dataset in unbalanced, I would suggest to first give a try with 250 samples for each class. If you don’t need to run any signal processing on the electro magnetic measurements, just use a “Raw Block” and then a Neural Network with default parameters and check the accuracy. If you don’t see clear clusters when generating raw block features, we can check more in details if some DSP could be helpful. If you wish to use your full dataset, you can check our tips on class imbalance
- Using the Uploader, we automatically split 80% training set / 20% test. Then in our NN block, we do an additional split from the training set (80% training, 20% validation).