Just ignore me if I am doing too many feature requests:
From the notes here I read:
"In short, if we make the following two constraining assumptions:
1. All bounding boxes are square and have a fixed size
2. The objects occupy a grid over the input
Then we can vastly reduce the complexity, and hence the size and speed of our model. In that case, FOMO will work in its optimal condition."
I suggest for setting FOMO bounding boxes, that we have a grid over the image and just select grid squares for the objects. Bonus: the ability to shift the image up, down, left, right to better line up the objects inside the grid before selecting the objects grid locations.