This repository is an official implementation of the paper Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown.
This repository is still under development, and feel free to raise any issues at any time.
Multi-object tracking (MOT) emerges as a pivotal and highly promising branch in the field of computer vision. Classical closed-vocabulary MOT (CV-MOT) methods aim to track objects of predefined categories. Recently, some open-vocabulary MOT (OV-MOT) methods have successfully addressed the problem of tracking unknown categories. However, we found that the CV-MOT and OV-MOT methods each struggle to excel in the tasks of the other. In this paper, we present a unified framework, Associate Everything Detected (AED), that simultaneously tackles CV-MOT and OV-MOT by integrating with any off-the-shelf detector and supports unknown categories. Different from existing tracking-by-detection MOT methods, AED gets rid of prior knowledge (e.g. motion cues) and relies solely on highly robust feature learning to handle complex trajectories in OV-MOT tasks while keeping excellent performance in CV-MOT tasks. Specifically, we model the association task as a similarity decoding problem and propose a sim-decoder with an association-centric learning mechanism. The sim-decoder calculates similarities in three aspects: spatial, temporal, and cross-clip. Subsequently, association-centric learning leverages these threefold similarities to ensure that the extracted features are appropriate for continuous tracking and robust enough to generalize to unknown categories. Compared with existing powerful OV-MOT and CV-MOT methods, AED achieves superior performance on TAO, SportsMOT, and DanceTrack without any prior knowledge.
- (2024/12/10) The demo using GroundingDINO + AED has been released. You can track on your own video now!
- (2024/9/14) Our paper is available at arXiv.
- Track on your own video.
- Deploy AED using TensorRT.
Method | Training Data | Detector | Base-TETA | Base-AssocA | Novel-TETA | Novel-AssocA | URL |
---|---|---|---|---|---|---|---|
AED | TAO-train | RegionCLIP | 37.2 | 40.4 | 27.8 | 29.1 | β¬οΈ |
AED | TAO-train | Co-DETR | 54.8 | 54.1 | 48.9 | 51.8 | model |
Method | Training Data | HOTA | IDF1 | AssA | MOTA | URL |
---|---|---|---|---|---|---|
AED | TAO-train | 72.8 | 76.8 | 61.4 | 95.0 | |
AED | SportsMOT-train | 77.0 | 80.0 | 68.1 | 95.1 | model |
Method | Training Data | HOTA | IDF1 | AssA | MOTA | URL |
---|---|---|---|---|---|---|
AED | TAO-train | 55.2 | 57.0 | 37.8 | 91.0 | |
AED | DanceTrack-train | 66.6 | 69.7 | 54.3 | 92.2 | model |
The codebase is built on top ofΒ MOTRv2.
- Install pytorch using conda (optional), PyTorch>=1.5.1, torchvision>=0.6.1
conda create -n aed python=3.8
conda activate aed
# pytorch installation please refer to https://pytorch.org/get-started/previous-versions/
# e.g. for cuda 11.3
conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
- Other Requirements
pip install -r requirements.txt
- Build MultiScaleDeformableAttention
cd <AED_HOME>
cd ./models/ops
sh ./make.sh
It is recommended to symlink the dataset root toΒ <AED_HOME>/data
.
- Pleases download TAO from here.
- Note that you need to fill in this form to request missing AVA and HACS videos in the TAO dataset.
- Convert TAO to COCO format and generate TAO val & test v1 filefollowing OVTrack, or you can simply download from here.
Pleases download SportsMOT from SportsMOT.
Pleases download DanceTrack from DanceTrack.
We've run the inference phase on two detectors, RegionCLIP and Co-DETR, and saved their detection results as JSON files.
For YOLOX, we get the detection results from MixSort and MOTRv2 for SportsMOT and DanceTrack respectively.
All of the detection results can be downloaded from here.
Here are the details for the json files:
JSON File | Dataset | Detector |
---|---|---|
TAO_Co-DETR_test.json |
TAO (base + novel), test | Co-DETR (LVIS) |
TAO_Co-DETR_train.json |
TAO (base + novel), train | Co-DETR (LVIS) |
TAO_Co-DETR_val.json |
TAO (base + novel), val | Co-DETR (LVIS) |
TAO_RegionCLIP_test.json |
TAO (base + novel), test | RegionCLIP (regionclip_finetuned-lvis_rn50 + rpn_lvis_866_lsj) |
TAO_RegionCLIP_train.json |
TAO (base + novel), train | RegionCLIP (regionclip_finetuned-lvis_rn50 + rpn_lvis_866_lsj) |
TAO_RegionCLIP_val.json |
TAO (base + novel), val | RegionCLIP (regionclip_finetuned-lvis_rn50 + rpn_lvis_866_lsj) |
YOLOX_DanceTrack_train_val_test.json |
DanceTrack, train + val + test | YOLOX from MOTRv2 |
YOLOX_SportsMOT_train_val_test.json |
SportsMOT, train + val + test | YOLOX from MixSort |
When the downloads are complete, the folder structure should follow:
βββ configs
β βββ dancetrack.args
β βββ sportsmot.args
β βββ tao.args
βββ data
β βββ DanceTrack
β β βββ dancetrack_url.xlsx
β β βββ test
β β β βββ dancetrack0003
β β β βββ ...
β β βββ train
β β β βββ dancetrack0001
β β β βββ ...
β β βββ val
β β βββ dancetrack0004
β β βββ ...
β βββ detections
β β βββ TAO_Co-DETR_test.json
β β βββ ...
β βββ SportsMOT
β β βββ dataset
β β β βββ annotations
β β β βββ test
β β β βββ train
β β β βββ val
β β βββ splits_txt
β β βββ basketball.txt
β β βββ football.txt
β β βββ test.txt
β β βββ train.txt
β β βββ val.txt
β β βββ volleyball.txt
β βββ TAO
β βββ annotations
β β βββ checksums
β β βββ README.md
β β βββ tao_test_burst_v1.json
β β βββ train_ours_v1.json
β β βββ validation_ours_v1.json
β β βββ ...
β βββ frames
β βββ test
β βββ train
β βββ val
βββ...
Download coco pretrained weight from here (Deformable DETR + iterative bounding box refinement) first.
Then put the downloaded weight into <AED_HOME>/pretrained
.
Please make sure you set the right absolute path of --pretrained
, --mot_path
, --train_det_path
, and --val_det_path
.
# TAO
cd <AED_HOME>
# e.g. bash ./tools/train_tao.sh configs/tao.args 0
bash tools/train_tao.sh [config path] [GPU index]
# SportsMOT
bash tools/train_sportsmot.sh [config path] [GPU index]
# DanceTrack
bash tools/train_dancetrack.sh [config path] [GPU index]
Multi-GPU is not supported yet.
After training, the results are saved in <AED_HOME>/exps/[dataset name]
Put the downloaded weights into <AED_HOME>/pretrained
like:
pretrained
βββ dancetrack_ckpt_train.pth
βββ r50_deformable_detr_plus_iterative_bbox_refinement-checkpoint.pth
βββ sportsmot_ckpt_train.pth
βββ tao_ckpt_train_base.pth
Start inference:
cd <AED_HOME>
# TAO
# e.g. bash tools/inference_tao.sh pretrained/tao_ckpt_train_base.pth configs/tao.args test 0
# Remember to choose the right --val_det_path in the config to specify a detector.
bash tools/inference_tao.sh [checkpoint path] [config path] [split (val / test)] [GPU index]
# SportsMOT
bash tools/inference_sportsmot.sh [checkpoint path] [config path] [split (val / test)] [GPU index]
# DanceTrack
bash tools/inference_dancetrack.sh [checkpoint path] [config path] [split (val / test)] [GPU index]
After inference, the results are saved in <AED_HOME>/exps/[dataset name]_infer_results
.
For SportsMOT and DanceTrack, you can upload the results to codalab to get the final score.
cd <AED_HOME>
# e.g. python tools/eval_tao.py --ann_file ./data/validation_ours_v1.json --res_path exps/tao_infer_results/infer1/inference_result/infer_result.json
python tools/eval_tao.py --ann_file path_to_annotations --res_path path_to_results
You need to use TrackEval for evaluation (val set).
# move to the path of AED
cd <AED_HOME>
# e.g.
# bash eval_sportsmot.sh \
# ./data/SportsMOT/dataset/val \
# ./data/SportsMOT/splits_txt/val.txt \
# exps/sportsmot_infer_results/infer1/result_txt \
# exps/sportsmot_infer_results/infer1
bash tools/eval_dancetrack.sh [GT path] [split txt path] [result_txt path] [output path]
bash tools/eval_sportsMOT.sh [GT path] [split txt path] [result_txt path] [output path]
Split txt of DanceTrack can be found in here.
Install GroundingDINO following the GroundingDINO repository:
cd <AED_HOME>
cd GroundingDINO
# set the CUDA_HOME, e.g. /usr/local/cuda
export CUDA_HOME=/usr/local/cuda
pip install -e .
cd <AED_HOME>
mkdir pretrained
cd pretrained
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
Run the demo:
cd <AED_HOME>
# e.g. bash tools/run_demo.sh configs/demo.args 0
bash tools/run_demo.sh configs/demo.args [GPU index]
You can set the "--text_prompt" argument in configs/demo.args
following GroundingDINO to track other categories.
We would like to express our sincere gratitude to the following works (in no particular order): MOTRv2, OVTrack, QDTrack, RegionCLIP, Co-DETR, YOLOX and GroundingDINO.
If you find this work useful, please consider to cite our paper:
@article{fang2024associate,
title={Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown},
author={Fang, Zimeng and Liang, Chao and Zhou, Xue and Zhu, Shuyuan and Li, Xi},
journal={arXiv preprint arXiv:2409.09293},
year={2024}
}