Background or no Background for Object Detection?

Hello all!

I have a general question related to what the best practice is for training an Object Detection model, especially when using Edge Impulse on small boards.

I’m currently working on a project that aims to identify multiple birds in a single image. I’ve watched the Edge Impulse tutorial on Object Detection, where a coffee cup and a lamp are identified, and notice that the room background is present in all of these images.

What I’d like to know is, are there any specific advantages to removing the background from an image that surrounds the object that you are trying to classify? Would it result in a better or worse result?

I realise with Image Classification that it can sometimes be important to get a varied set of images with different backgrounds on them in order to make a model suitable for various different types of location. Is this the case with Object Detection too?

For example, if I wanted my Object Detection model to understand what a blackbird was, or a robin, could I take a photo of either of those birds, remove the natural background from them (trees, sky, etc.) and have then use those modified images as ML data for a better Object Detection model?

Thanks in advance for any help that is offered, I’m trying to determine how best to collect data for my project. If a background is not needed, then I may be able to acquire data easier by creating a custom capture rig. If a background is needed, then that makes things a little more difficult!

I’m not entirely sure why (@dansitu can probably) but keeping the background in is better, and preferably even keep it varied so the network can learn properly to distinguish the object from other things. You probably even want (this is not in the tutorial, but should have been) some images without the object present.


Thanks for the response @janjongboom!

Ok, that’s what I was thinking. I was thinking of setting up a rig that would allow me to take photos of a bird against a solid colour background, then chroma key the background out and replace it with various different backgrounds, moving and rotating the bird to a new position in the image too. That way I could create a lot of synthetic data.

I’m taking it that this would be a good idea also? If I used a photo of a bird in a certain pose 10 times, but with that bird rotated differently and against a different background in a different position, would that work? Or would the machine learning process see if it just the same bird as a previous one, despite the new position/rotation/background?

Would be good to get the feedback/advice of @dansitu too if he’s the go to guy on this!

The point the mention re: Image Classification holds true for Object Detection too; the main theme is that you need to collect data for the training set that is representative of how you intend to use it in the future. E.g. if you never intend to run it overnight then don’t bother putting training data from when it’s dark; or if you want it to work on rainy days, make sure that your data isn’t just from clean sunny days.

In the case that the background doesn’t have much variation; e.g. fixed camera position, no occlusions / shadows, consistent lightning etc then there is not much benefit to background removal; the network will learn to ignore it. On the other hand, if you could very cleanly remove the background that would help, but, sadly, often that problem is just as hard as the actual object detection in the first place :slight_smile:

I think for a v1 it’s ok to do a minimal setup, with capture in the place you intend to do it in the future, capture as much data as you easily do, train a model and we can do some analysis. Often a simple pass like this can give us more information about which direction to take the model.


Oh yes, and regarding synthetic data, it’s not very often that it works sorry. We might use synthetic data for a boost in the data set down the track, but probably not for a first version. The idea you mention of the image being modified during training is something we do during training; image augmentation with random adjustments is best practice.

@matkelcey Hey Mat! Thanks for those replies :slight_smile:

What I’m creating might be used all over the place eventually, not just for one bird species but for many. That being said, different birds like different habitats, so I guess you’d never see a Swan up a tree, or a Woodpecker taking a swim, so I’m guessing using synthetic data that represented either of these two scenarios would be a bad idea?

Your second paragraph of your first reply post, you say that there’s not much benefit to background removal if I’m using a fixed camera position etc. But when you say “on the other hand” about cleanly removing the background from an image and that being of help, is this for the fixed camera position scenario or the scenario where there is no fixed camera position? Or both?

I’ll ask my collaborator to send over some of the image/video footage he has to me and I’ll see if I can get that “v1” pass you mention done :wink:

And yeah, I’m beginning to realise synthetic data doesn’t really help, or at least not just what I mentioned above. I created a bunch using some of the photography that I have against random backgrounds, but it only marginally improved the accuracy outcome.

I might, at some point in time (when I have the time/money to do so) build some fake environments in 3D, as I come from an apps/games background and put some birds in them, as that would simulate the real world well enough. But I’m going to need funding I think before I can do that :sweat_smile:

I’ve only seen background removal work well for a static background, even then it turned out it was minimal gain for the work that was required. You might like to have a look at BlenderProc which is great for synthetic data generation. But nothing beats collecting real world data.

Hmm, so background removal is only beneficial for when you’ll be recognising objects against a static background? I was thinking of building a mini bird photo photography studio and chroma keying the background out then adding in loads of fake backgrounds, but if that’s not going to result in better accuracy results then I’ll stop doing the CAD work on it.

I hear you on the real data thing, it’s just tricky getting the data right? I essentially need an army of photographers to get out and take any photo they can of my target bird species. I’m guessing that if an object, in my case a bird, is obscured, it should still be labelled?

I’m not saying synthetic data from a studio isn’t useful, it’s just often hard to get the sim-to-real domain transfer to work well. I wonder if it’s better to first try the curation of an existing open dataset? E.g. s6 of Building a bird recognition app and large scale dataset with citizen scientists: has a short review on the suitability of some existing open datasets for birds in images. It could be supplemented later with your studio data.

1 Like

Yeah I get what you’re saying. I’ll have a look around for existing datasets, I’m certainly not the only one attempting to do this kind of work, it just depends on whether there are that many open datasets for UK birds. I’ve found an iNaturalist one that the BBC mention they are using for their work.

Cheers for the assistance on this, I know I asked a lot of questions haha! I like trying to find shortcuts with things and/or optimising processes, but I’m not sure I can do that here by myself.

1 Like