Why do the centroids have the same size even though a cutpoint of 2 was chosen for FOMO?

I understood FOMO to be a classification of regions for object detection, where the regions are used as bounding boxes. If I set the cutpoint to 2, the input image should be reduced by two and the regions/receptive fields should be 2x2 in size. So why are the drawn bounding boxes larger than 2x2? I noticed that the bounding boxes are larger in the live classifications function.

Context/Use case: I am trying to classify small objects that are very close together. I hope that by using small receptive fields, I can solve the issue of tightly packed objects.

When BBs are adjacent to each other they get fused into one BB. See this also.

Thank you for the prompt response. The fusion of bbox sounds plausible. In a YouTube video about FOMO, it was mentioned that adjacent regions that classify the same object are discarded. Therefore, I wondered why the centroids are not 2x2 in size.