I am using the Beers vs. Cans FOMO example with RGB images (96, 96, 3) and downloaded the TensorFlow SavedModel to use with a Python script. As I understand the last layer (softmax) output shape is (12, 12, 3) which is a 12x12 grid representation of the input image where each grid contains 3 classes’ probabilities. How can the centroid location (x, y) of the detected object be extracted? How about the objects which spans 3 or 4 grids? Where is it implemented in the EI SDK?
I found the relevant functions in the SDK: inferencing-sdk-cpp/classifier/ei_fill_result_struct.h.