Object Detection: Count Cucumber

Hi everyone,

I started a project to count cucumbers on a belt from a harvesting machine - see picture.
Ideally, the size of each cucumber should be determined at the same time.

Can this be done with an Object Detection model?
If so, what is the best course of action?

  • Should the model be trained with pictures of individual cucumbers?
  • Should I only train the model with one “cucumber” class or with two “cucumber” and “other” classes?
  • How can the model be trained to recognize the transitions of several cucumbers when they are close together?
  • Should the same camera (Raspberry PI) always be used? Also for creating the images for the training and test

If not, are there alternatives that can be implemented with a Raspberry PI on the harvesting machine?

Thank you for your support and feedback!

Best regards,


From personal experience, I think this would be difficult to pull this off from this “bird’s eye view” level since the MobileNetV2 SSD FPN-Lite object detection model tends to struggle with smaller objects and in this frame there’s quite a few of them. I think an approach would be to define a smaller "region of interest’, count the number of cucumbers within that smaller region of interest and then iterate across the larger frame. This is something that you could potentially leverage something like OpenCV for, but this is more at the application level.

That being said, generally for this application I would start off with two classes, one cucumber, and then an ‘other’ bucket, so that if there is something that is green and somewhat shaped like a cucumber the model would be able to differentiate this. Otherwise, everything green and narrow may end up being identified as a cucumber. You would want to have images that have multiple cucumbers in there as well, not just the cucumbers by themselves, and probably also have ‘other’ in the shot as well to teach the model that even in the presence of real cucumbers there can be ‘others’.

When labeling the images it is ok for the bounding boxes to overlap, but because of the similarity in objects here it would be interesting to see how this performs on the device. It is not necessary to use the same camera since the images will be transformed to 320x320 resolution anyway. What is important to capture is a variety of orientations of the cucumbers in the region of interest. Lighting will play a factor depending on what time of day it is so you will want to take the same shots at different times of the day to account for this.

I hope some of this helps give some perspectives.

1 Like

Also, I’d suggest to take a look at https://matpalm.com/blog/counting_bees/ (by @matkelcey, one of our ML engineers).