I have trained a model using images at 1200x1200 and the precision and recall as reported on the EI website are acceptable. The input image was set at 120x120 and when the model is run on the target device using the Raw Features of one of the training images it produces the expected predictions.
However when one of the training images is resized externally (using Irfanview) to 240x240 and then that image is downsized to 120x120 by the application firmware the model either fails to detect objects in it or the probability score is greatly reduced.
Does the method of downsizing of images affect prediction accuracy? Should I always use the ei function for this? Would there be advantages in downsizing by a power of 2 (eg 8x to 150x150) and retraining the model to 150x150 input size?
Alternatively, is it more likely there is some other reason for the poor results?
You should try to stick to our resizing as it happens before the model is trained, then that data is used to create the model you will need to match our resizing if you are doing so again, not sure how Irfanview handles things but that could be adding more issues on resizing, see if you can find out what Irfanview is doing or remove that step.
You can see exactly how it’s done in our open-source processing block here:
Resizing an image multiple times (e.g., 1200 > 240 > 120) also compounds data loss on the fine details in the image. So even if they do use the same method there will be loss doing an additional step
Checked out your project really quickly and noticed it was uploaded from S3 so you could hopefully use some of this information to improve your workflow. Let me know if you need any additional info around your hardware and workflow, we have a bunch of hardware specific guides.
See also our model optimization guidelines in the docs these sections should help -
Thanks Eoin, that’s helpful advice. I will review those docs tomorrow.
NB using IrfanView was just for testing and is not part of the final architecture. I just needed to scale down by a factor of 5 for the test. But that may be where errors have been introduced.
Am I right in thinking that I can pass images of any size into run_classifier and it will resize as necessary internally? I need to grab a 1200x1200 JPG image from the camera anyway so rather than resizing it myself I think I could convert to RGB888 (I will have just enough memory) and pass into run_classifier as the signal. Presumably I need to set signal.total_length to 1440000 rather than 120x120.
For better efficiency perhaps I should retrain the model at 150x150 so the scaling is 8X rather than 10x
8x should work better but the other steps I shared in your other post like class imbalance etc will be the first port of call.
Resizing will happen when you pass to the run classifier, but 1200*1200 images will need to process and buffer on the device you run on so just be aware of that for implementation of the application side.
Re: Class imbalance
The principal classes we see in images are European and Asian Hornets. Smaller insects such as flies and wasps are mostly ignored by a pre-screening process that only triggers on new blobs of sufficient size. Smaller insects do appear in some images but usually only in conjunction with larger insects.
Should we either:
actively seek out further images of smaller insects from our collected data to boost their representation in the training data so they can be correctly identified as different from the target insects or
remove them entirely as a recognized class (ie remove the bounding boxes where they do occur) so they get treated as background
boost their significance using Auto-Weight
Generalising this for others: is it better to train a FOMO model with all the classes it might encounter in an image or can less common classes of no interest be ignored and handled as background.
I’ve have spent several days trying to get a 300x300 input size model to run on my ESP32S3 target processor. I cannot get it to work at all. Every image presented results in no inferences. There are no error or warning messages produced.
Steps to reproduce:
To test I modified the Edge Impulse Arduino examples > esp32 > esp32_camera sketch
to receive Raw Features copied from one of the training sample images (labelled as 175943). These are loaded into an array in PS_RAM and injected as float into ei_camera_get_data(). For the 300x300 image model there are 90000 Raw Features. There are 0 bounding_boxes in the result.
To confirm the logic I ran exactly the same code including a model library trained with 120x120 images and Raw Features from the same 175943 image (now 14400 features). This worked. It detected a single bounding_box with the same score as the training website.
I also ran it using an included model with 150x150 images (the 175943 training image now has 22500 raw features). This also worked and produced the expected score.
Are there soft limits on e.g. classification time or memory used that stop large models from running? What would stop run_classifier from detecting an object in one of its training images when it works on the training website?
Development environment:
Xiao ESP32S3 Sense with 8Mb PS-RAM
Platform IO running on Windows 10
Model deployed as C++ library (int8)
I prefer:
uint32_t r = snapshot_buf[pixel_ix + 2];
uint32_t g = snapshot_buf[pixel_ix + 1];
uint32_t b = snapshot_buf[pixel_ix ];
uint32_t feature = r << 16 | g << 8 | b;
out_ptr[out_ptr_ix] = (float)feature;
With this change I can resize images using the normal Espressif image_converters from 1200x1200 to 150x150 before passing to run_classifier and the inferences match the ones predicted in Studio.
I would still like to know why 300x300 images don’t seem to work but the accuracy at 150x150 is good enough for my purposes now.