FOMO project and Android support

jnebrera · April 20, 2022, 12:22pm

Hi all,

I would like to use the ultra wide camera of a mobile phone to track sport players and ball using FOMO. I believe is a good candidate, but please clarify. The camera is going to produce an image in the order of 8 to 10MP which is quite big.

First, the centroid idea is ok for me, and don’t worry if getting the centroid only, and not the bounding box. It is more of a “counting” scenario than really analytics, so I don’t need precise tracking in the sense of missing the right player ID, just to detect both of them.

The first doubt is with occlusion, as there will be quite a bit of it. I understand FOMO if the grid is not small enough, will just collapse both centroids into one of the cells, as it gives one “detection” per cell grid, right?

Second, it suggest objects to be of similar size, which is my case in one category (players), but not with the other category - ball. I understand this is not an issue as the request is for similar size within the same category, right?

I believe FOMO also supports tracking many objects. While the number of players is limited, say 20 something at most, the image might contain other people in the frame, that I guess will be detected as players too, so I guess something like 50 or maybe even up to 100 could happen (spectators). I have seen that through some modification this is doable, and I guess the mobile phone hardware will be able to cover this considering is much more powerful than your standard embedded stuff.

Now, lets say I have the model trained, how do I deploy it in an Android phone to run it, but not using the WebAssembly, but from a native Android app (say JAVA or Kotlin). I see there are libraries to port it to Python, or Go, or C++ but for Android what is the best way to do it?

Thank you very much

matkelcey · April 20, 2022, 11:19pm

Hi Jnebrera, some comments below…

If the full resolution is too slow to process you have the option of downsampling the input. It really comes down to how small the items to detect are with respect to the full resolution. The sweet spot for FOMO is detecting items that are in an output cell 1/8th of the input size; and it sounds like the things you are going for are smaller than that?

There’s some different ways you can go, based on a couple of things…

What’s the resolution of your input? vs
How big are the balls vs players?

Cheers, Mat

jnebrera · April 21, 2022, 7:26am

Hi Mat,

I could definetly down sample the original image, that would not be an issue if needed.

As for grid size, my understanding is mobile phone has much more power than your embedded targets, thus, even if makes little sense in other environments, they could cope with the load in their CPU/GPU.

The ball surely is smaller than the 1/8th ratio, but at the same time, for the whole project I really don’t need “exact” location of the element, but kind of "more or less in this area.

The goal is to decide where to “point” the main camera, as a cameraman, and for that, as humans do, a general idea of where to pint is more than enough

The same happens with players, I just need to know the general area were I have more players, to try to keep as much of them as possible within the frame

As for the ball there is an additional advantage, that is that we know only one ball is in the whole frame at any given moment

I understand I could validate if it works ok by running on the already available “web based” mobile client, but my main doubt is not only if FOMO is valid (that I’m pretty sure it is), but also if once done with the training, I can deploy the real model in something more “Android” standard way, not a webassembly

Thank you

matkelcey · April 21, 2022, 11:30pm

Yes, this all makes sense. If you start up a project with your data, including whatever resolution you want to try first, and add me to the project we can try some things in expert mode. Re: the Android deployment I’ll find someone on our team that’s in a better position to answer.
Cheers,
Mat

daschwar · April 22, 2022, 2:05am

Hi Jnebrera,

To add on answers to some of your SDK side questions, the linux SDKs for python/go/node scaffold around a precompiled binary built from our C++ SDK:

As a result I don’t think you’ll be able to make use of them in an android app. What you’ll likely need to do is compile our C++ library for use with the android NDK. Unfortunately we don’t have any existing examples of this at the moment, but our general C++ library deployment docs might be a good starting point:

Best,
David

jnebrera · April 27, 2022, 6:23am

Sorry for my late reply,

I just checked, and while the ultra wide panoramic camera has a 8MP sensor, it captures video in standard 729p or 1080p, which reduces significantly the stress on the object finding process.

Second, I do have quite a bit of footage recorded as “files”. I understand I can clearly use them to train the model (now that I have noticed the above fact about resolution format even further), but can I use them to “test” the model in a mobile phone? I mean, is it mandatory to use the camera for the Android webservice or can I just use the files?

Finally, can I do the same in iPhone? I understand you provide an Android client for testing, but as I would need to port the C++ code to Android, the same would apply if I decide to go for iOS?

Thank you very much

jnebrera · July 20, 2022, 8:46am

Sorry for my really late reply. I have been looking into how to work on this project.

Camera images are 1920x1080 (Full HD), people can be from 170x280 pixels to 180x550 pixels depending on how far are they from the camera in the field. There are many players in the field, and ideally, I would like to distinguish most of them from each other, considering at times they are going to be quite close to each other (occlusion). It doesn’t need to be perfect, but as said, would like to detect most of them. Also, considering that when they are very close to each other they are mostly in the same side of the field, this is not an issue.
The ball can be around 60x60 pixels, to 90x90 or so. And there is only one in the field. In this case, it is quite ok to make the position of the ball quite approximate, don’t need the regions to be small.
Do you think it makes sense?
I just built up a dataset with around 1K images, so I’m about to upload it to Studio.
Thank you

jnebrera · July 21, 2022, 7:29am

Hi Mat,
As you requested, I have added you to the project.
I have uploaded and labeled 240 1920x1080 images with the person and Ball labels
I have tried to create my first impulse, but gives me an error in the “object detection” stage and can’t continue
Can you help me?
Thank you

jnebrera · July 21, 2022, 11:44am

Ok, I just tried with a different configuration (320x320 image size), greyscale, and now has been able to train with the following results:

I guess I will have to include more images in the dataset to improve the ball side

At this moment, of the 240 images I uploaded, only 143 have the ball present (and only once per image)

jnebrera · July 21, 2022, 12:29pm

Seeing that now I was able to train the model, I tried in my original config, with RGB and 640x640 image size. Now, this are the results after only 30 epochs:

At least now the ball starts to appear.