From personal experience, I think this would be difficult to pull this off from this “bird’s eye view” level since the MobileNetV2 SSD FPN-Lite object detection model tends to struggle with smaller objects and in this frame there’s quite a few of them. I think an approach would be to define a smaller "region of interest’, count the number of cucumbers within that smaller region of interest and then iterate across the larger frame. This is something that you could potentially leverage something like OpenCV for, but this is more at the application level.
That being said, generally for this application I would start off with two classes, one cucumber, and then an ‘other’ bucket, so that if there is something that is green and somewhat shaped like a cucumber the model would be able to differentiate this. Otherwise, everything green and narrow may end up being identified as a cucumber. You would want to have images that have multiple cucumbers in there as well, not just the cucumbers by themselves, and probably also have ‘other’ in the shot as well to teach the model that even in the presence of real cucumbers there can be ‘others’.
When labeling the images it is ok for the bounding boxes to overlap, but because of the similarity in objects here it would be interesting to see how this performs on the device. It is not necessary to use the same camera since the images will be transformed to 320x320 resolution anyway. What is important to capture is a variety of orientations of the cucumbers in the region of interest. Lighting will play a factor depending on what time of day it is so you will want to take the same shots at different times of the day to account for this.