Multi-Object Tracking, also called the MOT, is the detection and follow-up of multiple moving objects at the same time in a dynamic environment. It finds crucial applications including autonomous vehicles, robot navigation, security surveillance, medical imaging and sports analysis. Multi-Object Tracking comprises two key challenges, namely, object detection and data association. Object detection is performed by a neural network that looks for the objects of interest, whereas, data association is performed by a time-lapse-aware neural network that looks for correspondences between the same object in two different frames. Traditional multi-object tracking approaches to train the object detection network and the data association network separately. These networks are optimized separately to obtain better performance in their parts of the job. This strategy fails to handle object detection and data association end-to-end in machine learning modeling, though these tasks rely wholly on each other. This issue limits improvement in performance beyond a certain level.
A few recent approaches introduced joint multi-object tracking to tackle the above-said problem. Some attempted tracking objects individually and independently that easily resolved the data association problem, but they led to a new problem. They ignored object-object relationships as they started tracking objects individually. Object-object relationships are crucial in identifying relative patterns among objects. On the other hand, some approaches attempted, including object-object relationships, but they necessitated training object detectors separately.
To this end, Yongxin Wang, Kris Kitani, Xinshuo Weng of the Robotics Institute, Carnegie Mellon University has developed an end-to-end trainable joint Multi-Object Tracking architecture using Graphical Neural Networks that is named GSDT, the abbreviation for GNNs for Simultaneous Detection and Tracking. GSDT models object-object relationships for both the data association and object detection. It follows the joint multi-object tracking strategy; thus it can be trained and optimized as a whole. It employs Graphical Neural Networks to obtain more discriminative features. This model achieves state-of-the-art results in various public multi-object datasets, including MOT15, MOT16, MOT17 and MOT20.
How GSDT differs from competing models
In GSDT, two images from successive frames and tracklets from the previous frame are given to the model as inputs. The model attempts to detect the objects in the current frame with these inputs and associate those detected objects with the tracklets of the previous frame. By associating the tracklets to the objects, the model decides iteratively whether to continue using a specific tracklet or to discontinue it or to initiate a new tracklet at the current frame.
An object detector and a re-identification module are used in GSDT to detect multiple objects and associate them simultaneously. In addition, graphical neural networks are used to extract and learn features and improve both object detection and data association performances. In short, the GSDT architecture is composed of four modules, namely, GNNs-based feature extraction module, node feature aggregation module, object detection module and data association module.
Python implementation of GSDT
- GSDT requires a PyTorch environment with CUDA enabled GPU runtime. Download the source codes from the official repository.
!git clone https://github.com/yongxinw/GSDT.git
Output:
- Change the directory to refer to the downloaded
GSDT
and explore its contents.
%cd /content/GSDT/ !ls -p
Output:
- GSDT works well with Anaconda-3 distribution. Download and install if the local machine does not have a conda environment.
!wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh !bash Anaconda3-2020.02-Linux-x86_64.sh
Output:
- Enable and activate the conda environment.
!bash
- Inside the conda’s base environment, provide the following command.
conda create -n dev python=3.6
A part of Code and Output:
- Activate conda’s development environment using the following command and run the following steps inside the development environment only.
conda activate dev
- Install the dependencies in the development environment by running pip command in recursion.
pip install -r requirements.txt
A part of Code and Output:
- Install the PyTorch version 1.7.0 that is compatible with the CUDA version 10.2. Anaconda distribution comes with CUDA 10.2 by default.
pip install torch==1.7.0
- Install the PyTorch Geometric package
bash install_pyg.sh CUDA_version=cu102
Output:
- Build Deformable Convolutional Neural Network version 2 from the source file using the following command successively.
cd ./src/lib/models/networks/DCNv2 bash make.sh
- Download the dataset from MOT15 and MOT20 challenges. Once the dataset is ready, the following commands generate labels corresponding to the objects.
cd src python gen_labels_15.py python gen_labels_20.py
- Download the pre-trained models corresponding to the MOT15 dataset and MOT20 dataset and their weights and move them to
/content/GSDT/experiments
. Perform sample evaluation on two frames from the datasets, each using the following commands successively.
cd ./experiments track_gnn_mot_AGNNConv_RoIAlign_mot15.sh model_mot15 track_gnn_mot_AGNNConv_RoIAlign_mot20.sh model_mot20
Performance of GSDT
GSDT has been evaluated on the open challenges MOT15, MOT16, MOT17 and MOT20. Compared with competing models, the model has been submitted by its authors to the official leaderboard of the MOT challenge. Models are evaluated based on numerous standard metrics including MOTA, IDF1, MT, ML and IDS.
GSDT greatly outperforms most of the well-acclaimed models including DMT, LIF_TsimInt, MDP_SubCNN, CDA_DDAL, MPNTrack, EAMTT, AP_HWDPL, NOMTwSDP, RAR15, Tube_TK, CTrackerV1, CTTrack17, SORT20 and POI. GSDT is recognized as the state-of-the-art in the MOT challenge during its publication.