I’m working on a project that involves using a FOMO model to detect cars the road.
In example FOMO projects I’ve seen, it looks like the training images have always contained at least one object of interest that gets labeled.
For example, the Car Parking Occupancy Detection tutorial has training images that all contain examples of cars. And the beer bottle tutorial uses a dataset of images that always contain a bottle.
My question is - should training images always contain at least one example of an object you’re trying to detect?
Or, is there any benefit to including training images that just have background alone? (For example just an empy parking lot with no cars for the Car Parking tutorial above)
It does not matter whether you include images with just background or not. FOMO looks at all of the receptive field grid cells in the image and classifies them to one of your classes or “background.” This background class is automatic, so you should not need to add it as a separate class.
For example, if you have a 12x12 grid of representations in FOMO (which comes from e.g. a 96x96 pixel image) and 2 of those grid cells has your object (e.g. car), the other 142 cells will be labeled as “background.” With 100 image samples (let’s say each has labeled cars), that’s 14200 instances of “background” for training, which is quite a lot! As a result, you generally do not need to include instances of just background (with no objects) when training such object detection models.
That being said, if your object is always covering some important part of the background in every image, it might be important to have some instances where that background is used in training data.
Thanks for the thorough explanation! This definitely answers my question