Identification is a major issue in object detection. It is not just about detecting the presence or the class (player, ball…) of an object, but also finding its identity. For soccer player detection, for example, the goal is also to be able to recognize each player. However, we need for this a reference, a player catalog, in order to determine which identity corresponds to which image.
Identification with a catalog of player
Unfortunately, such a catalog does not always exist. In this case, we can use a system comparing two images of an object, and figuring if they have the same identity. We can thus distinguish all the players during a match, without even determining their proper identity. This system is called "re-identification".
Re-identification of 4 different players
Re-identification is essential for the tracking of moving objects, in other words, the tracking of these objects through sequences. Re-identification thus makes it possible to associate detected objects with the same identity throughout the images, to follow the movement of a player on the field. This is generally done in addition to classic tracking algorithms modeling the dynamics of objects. While the dynamic tracking works well frame by frame, re-identification makes it possible to find objects that were no longer visible on several consecutive images, where the dynamic can’t.
Object detection & tracking of players
This article aims to show how to create a re-identification neural network for soccer player tracking using TensorFlow.
A re-identification algorithm using Triplet Loss
A re-identification system consists of a Siamese neural network that performs image "embedding". It is a process assigning a vector to an image, so that similar images will have "close" embedding vectors, according to a defined distance (Euclidean distance, cosine similarity, ...).
Re-identification system using an embedding neural network
Here we choose to use for this network a cost function called “Triplet Loss”, as described in the article “In Defense of the Triplet Loss for Person Re-Identification” (https://arxiv.org/abs/ 1703.07737). This function compares a triplet of embedding vectors: an anchor, a positive, and a negative. The anchor is the vector against which the positive and negative are compared. The positive and negative correspond respectively to an object with the same identity as the anchor, and an object with a different identity. Here is an example with a distance d and a triplet (a,p,n) :
With a margin m = 0.5, we get the following result for our cost function:
Triplet of players with an anchor, a positive, and a negative
In order to optimize the calculation of triples, the images are processed in batches. For example with a set of 3 people (Alice, Bob, Eve) :
Distance between embedded vectors of Alice, Bob and Eve
A naive approach to calculate our cost function on this batch would be to average the costs for each triplet. This is the so-called "batch-all" strategy
Triplet Loss of a batch-all strategy (m=0.5)
However, according to the article, this strategy is not optimal. A good strategy is to only select the hardest positive, maximizing the distance d (a, p), and the harder negative, minimizing d (a, n). This is the "batch-hard" strategy.
Triplet Loss of a batch-hard strategy (m=0.5)
This strategy further penalizes images that are poorly identified by embedding: positives too far from the anchor, or negatives too close. This will make training more effective in dealing with difficult cases.
Training of the re-identification algorithm from a personalized database
To have an efficient re-identification algorithm, it may be useful to build a specific database of the objects that we want to detect (here a bounding boxes database of football players). We must then train our system with this database. The code used in this article comes from https://github.com/VisualComputingInstitute/triplet-reid.
The first step is to train the model with a training database of player images, and a csv file containing, for each image, the corresponding player ID, the path to the image, and the group ID of the object (the match ID):
pipenv run python train.py \
--train_set <training dataset csv> \
--image_root <root of images> \
--experiment_root <training output folder>
Once the training is done, we must then check the quality of the embedding neural network with the validation database. To do this, we embed the images from this validation database:
pipenv run python embed.py \
--experiment_root <training output folder> \
--dataset <validation dataset csv> \
--filename <embedding output filename
Finally, we launch an evaluation of the embedding images from the validation database. This evaluation takes images 2 by 2 and checks whether the model designates them as the same player or not, using a distance metric (Euclidean or cosine similarity) between the embedding vectors and a similarity threshold. To limit the number of pairs of images to compare, we split the validation database into a query database and a reduced size gallery database, and we only compare images between the query and the gallery :
pipenv run evaluate.py \
--query_dataset <query dataset csv> \
--query_embeddings <query embedded vectors> \
--gallery_dataset <gallery dataset csv> \
--gallery_embeddings <gallery embedded vectors> \
--filename <embedding output filename>
The general training score is obtained using the AP score, or Average Precision score (see https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html), considering the false positive and false negative errors.
Specific training related to re-identification for sport
One of the most popular databases in re-identification, Market 1501, consists of supermarket customers. However, re-identification for sport has very specific problems making it way harder than simply re-identifying people like Market 1501. Here are some specific additional difficulties regarding football :
- more diverse body movements and orientations (races, strikes, imbalances, etc.)
- contacts between players
- more similar players (same type of outfit, same jersey color for same team players, etc.)
However, it is possible to improve the algorithm's ability to recognize similar players. One solution for this is to perform group training, in other words, to only compare players from the same match. We must then add an attribute to each image of our csv file: the groupId (which is the match ID in our case). This training limits external bias towards the players (field, shirts, brightness, contrast, etc.), and thus encourages the system to focus on details that really differentiate the players of the same team.
Csv dataset with a group Id
We have therefore created a new version of the Visual Computing Institute code, taking the groupId into account during training (see https://github.com/piercus/triplet-reid). The evaluation code is also modified to compare only players from the same match. This makes perfect sense, since re-identification will actually be used within the same match.
We carried out a first training with this code and compared it with Market 1501, the re-identification of supermarket customers. Even with a not very diversified training base (90 players), we managed to obtain a mAP of 68% on a validation database, against 52% with Market 1501. With a more complete database, we will get even better results.
Through this article, we were able to browse different aspects of re-identification applied to football:
Unlike identification, re-identification performs player tracking without a player identity catalog.
The re-identification algorithm relies on a Siamese image embedding network.
The Triplet Loss is used to better handle difficult cases.
Batch training and evaluation of the same match helps limit external bias
We can remember that each re-identification problem has its own specificities and its own difficulties. Re-identifying soccer players with a model like Market 1501 trained from customers in a supermarket does not work properly due to too much similarity between the players. But our training with a personalized database allowed us to focus on the specific characteristics specific to our problem, and thus to gain greatly in precision.
‘A word from the CTO - Pierre Colle’
We can clearly see from Samuel's article that when we say "Intelligence is Data", it encompasses several ideas:
The precision of the data directly impacts the quality of intelligence
By specializing the database, we make intelligence more relevant
The structuring of the data is a direct lever of intelligence (here the structuring in batch, and the grouping by matches)
It is both a business expertise (understanding the data) and a technological expertise (understanding the learning mechanisms)