Traditional image processing

Hello, is there a good way to distinguish the following labels? I would like to judge whether they are placed correctly by judging the way of labels.

Hi @caifan,

Can you guarantee that the object (that the labels are going on) will be placed in the exact same position each time in the frame? If so, then you can likely train a model that will look for proper placement of the label and give you a confidence score based on whether new images look like ones in the training set.

@shawn_edgeimpulse Hello, the position of each placement can be guaranteed to be consistent. It is divided into left side and right side. His background is classification problem. After I use my own classification model for transfer learning, the effect is not very ideal.

By training a model to find the correct location of a tag, do you mean an object detection algorithm?

Hi @caifan,

My thought would be to use image classification if you can guarantee that the placement of the object in the frame is the same every time. I find that transfer learning works well on common objects (people, animals, plants, cars, planes, etc.) or things close in appearance to those objects, as those objects are in the original training dataset for the model.

For this project, my suggestion is to collect a bunch of sample images with the label in the correct spot (say, 50-100 images) and a bunch of images with the label in the incorrect spot (50-100 images). Train a basic CNN (do not use transfer learning) to see how well that performs on your test set.

I also recommend looking into data augmentation. You can click the “data augmentation” button in the Studio to see if that helps. Alternatively, you can see an example of manually creating augmented images in this Colab that I put together for my vision course.

@shawn_edgeimpulse Hello, I can assure you every place is the same, but the glass there are several types, if only selected 50-100 and then choose data to enhance realization of amplification, will have very good effect in test set, deployed to OpenMV development board on the real-time detection effect is not so good, and the probability of wrong (right were judged as wrong, What is wrong is judged right) is very large. Before, I also compared the ordinary network with and without transfer learning, and found that the effect of transfer learning was better than that of the latter. The text on the label would also cause certain interference to the classification effect. This time I showed the data that existed in the training set, and it could detect it, but with a similar slide, he would have judged it wrong.

@shawn_edgeimpulse That is, the difference between correct and incorrect placement is very small, and the test effect is very good on the platform, but the effect is unsatisfactory once deployed, and the misjudgment rate will be very high.

@shawn_edgeimpulse Or if you have time, can you try something on my data set? My project ID is 92434. Thank you very much!

Hi @caifan,

It looks like you’re getting 96+% accuracy on validation and test sets, which is really good. You’ll likely need more data and to tweak the model hyperparameters to get much higher than that.

It does look like the objects can shift position and orientation in the different images. This will make classification more difficult.

You mentioned that you have trouble getting it to work when deployed. This is, unfortunately, a common problem with vision projects. There are many, many things that you have to account for when deploying a vision project. To name a few:

  • Lighting conditions may change between dataset capture and deployment
  • Camera position and angle may change
  • Camera distance from subject may change
  • Lens may change

Make sure you are controlling all of those when you go to deploy your project. Additionally, you mentioned that you are using the OpenMV. I recommend carefully looking at the code–are you cropping to the 96x96 resolution (given by the “fit longest” in your impulse) and converting to grayscale before feeding the .tflite model? The OpenMV example code from Edge Impulse does not perform scaling, and OpenMV does not have a good scaling method out of the box.

MobileNetV2 is interesting in that it “just works” regardless of the input resolution (down to some minimum amount–I think it’s 96x96), as it divides images based in its bottleneck layers rather than assuming some given input size. So, you can train MobileNetV2 on 96x96 images and then feed it 240x240 images for inference, and it will work…but you’ll likely see a decrease in accuracy. Your best bet is to train and perform inference with images of the same resolution.