Semantic-Aware Fine-Grained Correspondence

This repository is the official PyTorch implementation for SFC introduced in the paper:

Semantic-Aware Fine-Grained Correspondence. ECCV 2022 (Oral)
Yingdong Hu, Renhao Wang, Kaifeng Zhang, and Yang Gao

Installation

Dependency Setup

Python 3.8
PyTorch 1.7.1
Other dependencies

Create an new conda environment.

conda create -n sfc python=3.8 -y
conda activate sfc

Install PyTorch==1.7.1, torchvision==0.8.2 following official instructions. For example:

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch

Clone this repo and install required packages:

git clone https://github.com/Alxead/SFC.git
pip install opencv-python matplotlib scikit-image imageio pandas tqdm wandb

Dataset Preparation

We use YouTube-VOS to pre-train fine-grained correspondence network.

Download raw image frames (train_all_frames.zip). Move ytvos.csv from code/data/ to the directory of YouTube-VOS dataset.

The overall file structure should look like:

youtube-vos
├── train_all_frames
│   └── JPEGImages
└── ytvos.csv

Pre-training Fine-grained Correspondence Network

To pre-train with a single 24GB NVIDIA 3090 GPU, run:

python train.py \
--data-path /path/to/youtube-vos \
--output-dir ../checkpoints \
--enable-wandb True

Training time is about 25 hours.

Pre-trained Model

Our fine-grained correspondence network and other baseline models can be downloaded as following:

Pre-training Method	Architecture	Link
Fine-grained Correspondence	ResNet-18	download
CRW	ResNet-18	download
MoCo-V1	ResNet-50	download
SimSiam	ResNet-50	download
PixPro	ResNet-50	download
ImageNet classification	ResNet-50	torchvision

After downloading a pre-trained model, place it under SFC/checkpoints/ folder. Please don't modify the file names of these checkpoints.

Evaluation: Label Propagation

The label propagation algorithm is based on the implementation of Contrastive Random Walk (CRW). The output of test_vos.py (predicted label maps) must be post-pocessed for evaluation.

DAVIS

To evaluate a model on the DAVIS task, clone davis2017-evaluation repository.

git clone https://github.com/davisvideochallenge/davis2017-evaluation $HOME/davis2017-evaluation

Download DAVIS2017 dataset from the official website. Modify the paths provided in code/eval/davis_vallist.txt.

Inference and Evaluation

To evaluate SFC (after downloading pre-trained model and place it under SFC/checkpoints), run:

Step 1: Video object segmentation

python test_vos.py --filelist ./eval/davis_vallist.txt \
--fc-model fine-grained --semantic-model mocov1 \
--topk 15 --videoLen 20 --radius 15 --temperature 0.1  --cropSize -1 --lambd 1.75 \
--save-path /save/path

Step 2: Post-process

python eval/convert_davis.py --in_folder /save/path/ --out_folder /converted/path --dataset /path/to/davis/

Step 3: Compute metrics

python $HOME/davis2017-evaluation/evaluation_method.py \
--task semi-supervised --set val \
--davis_path /path/to/davis/ --results_path /converted/path

This should give:

 J&F-Mean   J-Mean  J-Recall  J-Decay   F-Mean  F-Recall  F-Decay
 0.713385 0.684833  0.812559 0.171174 0.741938  0.851699 0.234408

The reproduced performance in this repo is slightly higher than reported in the paper.

Here you'll find the command-lines to evaluate some baseline models.

Fine-grained Correspondence Network (FC)

For step 1, run:

python test_vos.py --filelist ./eval/davis_vallist.txt \
--fc-model fine-grained \
--topk 10 --videoLen 20 --radius 12 --temperature 0.05 --cropSize -1 \
--save-path /save/path