Advice needed: Classifying Object Detection Bounding Boxes

TechDevTom · June 7, 2021, 3:09pm

Hi again all! You’ve been super helpful with my other couple of enquiries so far, so thought I’d ask for more help, hope that’s ok! I’m thinking at this point maybe I need to take a course somewhere…

I’ve been looking at Object Detection, and perhaps naively, was hoping I could use Object Detection to help classify different birds. Obviously Image Classification is what I need here, but thought it was worth a try. So I’m thinking that I need to use Object Detection to detect where the birds are in an image, then run each of those birds through an Image Classifier.

This being the case, could I please ask for advice on what you Edge Impulse folks would suggest is the best way of using Image Classification to classify images, in my case birds, in the bounding boxes provided by the Object Detection?

Would it be the case that I would need to feed in an array of pixels, representing a bird, to an Image Classifier per bounding box found through Object Detection? Or do I need to save out a new image for every bounding box found and then feed that into an Image Classifier?

My target device is a Raspberry Pi 4. Jan gave me a bit of sample code for a pervious query of mine, where I can see bounding boxes are being detected:

    if "classification" in res["result"].keys():
        print('Result (%d ms.) ' % (res['timing']['dsp'] + res['timing']['classification']), end='')
        for label in labels:
            score = res['result']['classification'][label]
            print('%s: %.2f\t' % (label, score), end='')
        print('', flush=True)

        if (show_camera):
            cv2.imshow('edgeimpulse', img)
            if cv2.waitKey(1) == ord('q'):
                return

    elif "bounding_boxes" in res["result"].keys():
        print('Found %d bounding boxes (%d ms.)' % (len(res["result"]["bounding_boxes"]), res['timing']['dsp'] + res['timing']['classification']))
        for bb in res["result"]["bounding_boxes"]:
            print('\t%s (%.2f): x=%d y=%d w=%d h=%d' % (bb['label'], bb['value'], bb['x'], bb['y'], bb['width'], bb['height']))

Any help/advice you can provide would be greatly appreciated, as always! I definitely owe you all at least one cup of coffee at this point I think. Thanks

janjongboom · June 7, 2021, 3:27pm

Hey @TechDevTom the label (aka the type of bird) is already provided per box (in bb['label']) so no need for an additional image classification model.

TechDevTom · June 7, 2021, 4:11pm

@janjongboom That’s if I set the model to look for a blackbird and a robin, as different birds, rather than a bird and a cat, as different species/“objects”, right?

I’ve found that, in terms of differentiating between two different birds, in my case a Turtledove and a Common Linnet, Image Classification yields more accurate results than Object Detection when using the same data. I was thinking that by making the Object Detection side just detect a “bird” rather than a particular bird, then using Image Classification afterwards to classify the results, I’d get more accurate classifications. Is this not so?

janjongboom · June 8, 2021, 8:46am

@TechDevTom Ah check, yeah, you can do that. You can:

Crop the bounding box out of the main image, e.g. via: https://stackoverflow.com/questions/15589517/how-to-crop-an-image-in-opencv-using-python (you have the full image in the img variable, and you have the coordinates).
Then use Raspberry Pi 4 Object Detection - A Camera Question to classify that image (with the second model).

TechDevTom · June 8, 2021, 9:23am

Ah that’s what I was after @janjongboom, thank you

Is there any reason why the Image Classification model type is better at classifying than Object Detection? I know the name suggests it, but I would have thought that since the Object Detection code does classification that the results would be similar?

I’ll go experiment with that cropping code and see how I get on. Thanks again!

janjongboom · June 8, 2021, 9:52am

Good question, @dansitu?

dansitu · June 8, 2021, 7:45pm

This is an interesting question! The original paper on Single Shot Detection (the model architecture we use) specifically mentions that the model has trouble differentiating between animals in its test dataset:

It also works much better for larger objects—if the object takes up more space in the frame it’s more likely to be correctly classified.

With these limitations in mind, I like your approach of using the object detection model to locate birds and then a more fine-grained classifier to identify the species. Since wildlife is a common use case for us, we’ll also look at what we can do to improve performance on these types of datasets.

Thanks for your interesting thoughts!

TechDevTom · June 9, 2021, 1:07pm

Interesting note on how much space an object takes up on screen, in my use case that’s not always going to be guaranteed. Chances are, that will actually be a rare occasion birds wise, so training my model to detect birds from a distance is important.

Part of my work on this @dansitu is to make sure that no images without birds/mammals in them get stored, to save space on a device/SD card as it records data, so mammals might end up being bigger on screen when using Object Detection. Are there any open source pretrained models that do this Edge Impulse/TinyML wise already that you know of? I’d rather not reinvent the wheel if I can use something that already exists.

I’m glad that what I’m thinking of makes sense! Can you foresee any issues running two different types of ML models on one device? If there are any improvements to be made then that would be great, what I’ve got to work with now should work as a prototype though!

You’re welcome, thanks to you guys for putting Edge Impulse together, it’s led to me finally being able to do some interesting tech conservation volunteer work with the RSPB! Any assistance on the above two questions would be much appreciated

janjongboom · June 9, 2021, 1:27pm

I’m glad that what I’m thinking of makes sense! Can you foresee any issues running two different types of ML models on one device? If there are any improvements to be made then that would be great, what I’ve got to work with now should work as a prototype though!

Nope, on Linux you can run many models at the same time

You’re welcome, thanks to you guys for putting Edge Impulse together, it’s led to me finally being able to do some interesting tech conservation volunteer work with the RSPB! Any assistance on the above two questions would be much appreciated

Awesome to hear, especially given that this is conservation which we care about a lot as well!

dansitu · June 9, 2021, 5:21pm

So it’s outside of Edge Impulse, but there’s this general purpose animal detector model from Microsoft that I keep hearing great things about:

github.com

microsoft/CameraTraps/blob/master/megadetector.md

## Table of contents

1. [MegaDetector overview](#megadetector-overview)
2. [Our ask to MegaDetector users](#our-ask-to-megadetector-users)
3. [Who is using MegaDetector?](#who-is-using-megadetector)
4. [Downloading the model(s)](#downloading-the-models)
5. [Using the models](#using-the-models)
6. [Tell me more about why detectors are a good first step for camera trap images](#tell-me-more-about-why-detectors-are-a-good-first-step-for-camera-trap-images)
7. [Pretty picture](#pretty-picture)
8. [Mesmerizing video](#mesmerizing-video)
9. [Can you share the training data?](#can-you-share-the-training-data)


## MegaDetector overview

Conservation biologists invest a huge amount of time reviewing camera trap images, and &ndash; even worse &ndash; a huge fraction of that time is spent reviewing images they aren't interested in.  This primarily includes empty images, but for many projects, images of people and vehicles are also "noise", or at least need to be handled separately from animals.

*Machine learning can accelerate this process, letting biologists spend their time on the images that matter.*

To this end, this page hosts a model we've trained - called "MegaDetector" - to detect animals, people, and vehicles in camera trap images.  It does not identify animals, it just finds them.

This file has been truncated. show original

Definitely worth a look!

TechDevTom · June 10, 2021, 8:53am

Ah yes, I’ve already looked into this, it is very good but running it on a Raspberry Pi isn’t an option from what I’ve read. There’s too much to be done by the model, it’s not for Edge devices apparently. I guess there would be no harm in trying it, but if I remember correctly one guy said it took a minute to classify just one image on an RPi 4 Not exactly speedy classifying haha!

I’ll just need to keep pouring in image data into my Object Detection model and see where I get with it. I’ve got a 49.2% accuracy rating on the testing stage, recognizing birds vs birds in flight vs mammals right now. During the training stage I’m getting a 47.7% accuracy with the unoptimized model, and a 32.4% accuracy with the quantized model.

I’ve input about 100 images each of cats, rabbits, dogs and foxes as mammals, and various amounts of different shaped birds like parrots, buzzards, yellowhammers, linnets and turtledoves. I’ll have a think on what else I can feed into the model to make it more accurate. I’ll see where maybe another 500-1000 images gets me!

dansitu · June 10, 2021, 5:43pm

Interesting! Maybe you could just try and train the object detection model to recognize any bird or mammal (only 1 category) and then use a classifier to determine the type?

TechDevTom · June 10, 2021, 5:55pm

Ah are you saying disregard whether it’s a bird or mammal @dansitu , and just identify the fact that something is in the image with the one label?

My labels right now are either “Bird”, “Bird_Flight” or “Mammal”, but I could go through and say “Something” and go from there!

dansitu · June 10, 2021, 6:12pm

It might be worth a try given there’s a documented issue with discerning between similar object categories for animals—let me know if it seems to improve your object detection accuracy!

TechDevTom · June 10, 2021, 6:14pm

Will do, going to try uploading some more bird photos and mammal photos and separating them out for now, but if that doesn’t improve accuracy rating, I’ll give the one label thing a go!

Then I guess I could potentially train it to recognise “cat”, “dog”, “fox” etc. with Image Classification, and for birds get down to actual types of birds like “Turtle Dove” and “Linnet”.

TechDevTom · June 11, 2021, 2:58pm

@janjongboom I’m getting a bit of a time out issue with my model training, it’s saying that it’s taking too long to train Is there anything I can do to improve on this?

janjongboom · June 11, 2021, 3:11pm

Yeah what’s the project ID you’re using?

TechDevTom · June 12, 2021, 9:18am

The project ID is “RSPB_MLCam_OD_Prototype”. Let me know if you need any more info

aurel · June 14, 2021, 7:46am

Hi @TechDevTom,

I increased the compute time limit on your project.

Aurelien

TechDevTom · June 14, 2021, 11:58am

@aurel Thanks for that! I’ll give it another try today