How would I fix my terribly inaccurate model?

I am working on a project in which I need to be able to correctly identify a pet. I don’t care about any other objects; Just a pet. I made a model using datasets I found online. the pet dataset contains dogs and cats. I uploaded around a total of 10,000 dog and cat images. but unfortunately, I have to have two labels. So I created an “Unknown” label. with this, I uploaded a couple of thousand images of random indoor living spaces since my project will primarily run indoors. this came out with terrible inaccuracy whenever a human was in the picture, with the camera thinking the human was a pet. So, I added a new dataset of a couple of thousand pictures of humans. now when I run it, there is very little rhyme or reason to what its predictions are. it’s terribly inaccurate. I’m running this on an ESP32-CAM. I don’t think this is the problem though. I just want to have two “labels” one for “pets”, and one for “everything else”. I think my problem might be my datasets since there are some humans in the pictures with the dogs in that dataset. in reality I just don’t know where or how to get my dataset. If you know a way to fix my issue, I would greatly appreciate it. thanks!

One of the cons of using an existing dataset is it probably has a lot of variation that you probably don’t care about. Do you intend to have this setup in a fixed location? i.e. not much variation in background. If so you’ll possibly get a better result by first collecting data specific to your setup. The main thing to look out for is variation in things like lighting; e.g. if you intend for this to work in the evenings when the lights are on, make sure you don’t just collect data in the morning with natural lighting. Let us know and we can help iterate a bit. :slight_smile:


I’m actually working on a commercial product that’s going to be in all sorts of places. is there a way to not have a second label? like jsut have it look for one thing and that one thing is a pet?

Yes, you can frame the problem as just a single binary output; P(pet) but the maths ends up being almost exactly the same as if you do two way classification {PET, NOT_PET} . You can squeeze a bit more out of it, but unfortunately I’d say if the two way classification isn’t working for you now, it’s unlikely you’ll get any boost from just framing as a single class sorry.

Hello @shawnm1, you may be able to improve detection by framing your problem as object detection instead of image classification. This will put you in the best position to ignore anything that you don’t want identified for any given frame, while optimizing for exactly the items you are actually looking for. In this way, the algorithm will also be able to learn the parts of the image that do NOT have a pet (e.g., a human or furniture) even when a pet is in fact present. This does mean however that you have to find datasets that include bounding boxes in a compatible format.

Since you are working on a commercial product, do feel free to reach out to me directly. I would love to work with you to get your project off the ground.


Object detection would be great but doesn’t edge impulse only have object detection support for linux? I’m using an ESP-32cam microcontroller, which of course does not run linux. Is there a way to make it work on a microcontroller? Also I really could use the help getting my project off the ground but it appears your messages are disabled. is there another way that you would like me to contact you?

You can also run object detection on MCUs… let me check what your options are for ESP32 and will post back here. I have also reached out to you directly to discuss your project in more detail.


Hello @shawnm1,

I managed to run object detection models using FOMO on the ESP32, note that it will be a bit slow and you’ll have to use small images but it works.
Please checkout our FOMO documentation:



1 Like

Cool! Thanks! I’ll check it out

about the bounding box part, do you know of anywhere where I could get a dataset that has already done? I will if I have to, but I would prefer to not have to label 10000+ images.