It is my understanding that FOMO models work best when all tagged objects in the training set are roughly the same size. However, I am wondering if there are any training methods that can produce a FOMO model that is capable of detecting people both close-up and far away (up to 50 feet, even 100 feet away if possible).
For our use case, we need to count the number of people in a relatively large pool area. These numbers will be reported approximately every minute, so we are able to greatly sacrifice framerate for model accuracy in our use case.
We are using three or more OpenMV H7 Plus cameras, strategically placed such that we can outline distinct regions of interest for each camera to count in, totaling their outputs to end up with the full count of the pool area. I have trained a model on a dataset that depicts people at varying distances from the camera, but I have not been able to achieve very high accuracy.
This is an example of the type of detection we hope the camera can achieve:
I have tried using many custom datasets to train a model capable of this, but they all seem to produce models with low accuracy that have a significant number of false positives when I run them on the cameras. I suspect that this is due to the distance factor that I mentioned at the start, or the images having too much variability.
Any help with dataset collection, labeling strategies, or other methods would be greatly appreciated!