This is the 1st commit

shenyunhang · Dec 5, 2023 · 030851b · 030851b
commit 030851b
Show file tree

Hide file tree

Showing 488 changed files with 71,814 additions and 0 deletions.
diff --git a/.asset/framework.png b/.asset/framework.png
diff --git a/.asset/head.png b/.asset/head.png
diff --git a/.asset/radar.png b/.asset/radar.png
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,53 @@
+# output dir
+output
+instant_test_output
+inference_test_output
+
+
+*.png
+*.json
+*.diff
+*.jpg
+!/projects/DensePose/doc/images/*.jpg
+
+# compilation and distribution
+__pycache__
+_ext
+*.pyc
+*.pyd
+*.so
+*.dll
+*.egg-info/
+build/
+dist/
+wheels/
+
+# pytorch/python/numpy formats
+*.pth
+*.pkl
+*.npy
+*.ts
+model_ts*.txt
+
+# ipython/jupyter notebooks
+*.ipynb
+**/.ipynb_checkpoints/
+
+# Editor temporaries
+*.swn
+*.swo
+*.swp
+*~
+
+# editor settings
+.idea
+.vscode
+_darcs
+
+# project dirs
+/ape/model_zoo/configs
+/datasets/*
+!/datasets/*.*
+/projects/*/datasets
+/models
+/snippet
diff --git a/README.md b/README.md
@@ -0,0 +1,270 @@
+# APE
+
+APE: Aligning and Prompting Everything All at Once for Universal Visual Perception
+
+---
+
+
+[[`Paper`](https://arxiv.org/abs/2312.02153)] [[`Demo`](https://huggingface.co/spaces/shenyunhang/APE)] [[`BibTex`](#black_nib-citation)]
+
+
+## :bulb: Highlight
+
+- **High Performance.**  SotA (or competitive) performance on **160** datasets with only one model.
+- **Perception in the Wild.** Detect and segment **everything** with thousands of vocabularies or language descriptions all at once.
+- **Flexible.** Support both foreground objects and background stuff for instance segmentation and semantic segmentation.
+
+
+## :label: TODO 
+
+- [x] Release inference code and demo.
+- [x] Release checkpoints.
+- [x] Release training codes.
+- [ ] Add clean docs.
+
+
+## :hammer_and_wrench: Install 
+
+1. Clone the APE repository from GitHub.
+
+```bash
+git clone https://github.com/shenyunhang/APE
+cd APE
+```
+
+2. Install the required dependencies and APE.
+
+```bash
+pip3 install -r requirements.txt
+python3 -m pip install -e .
+```
+
+
+## :arrow_forward: Demo Localy
+
+**Web UI demo**
+```
+pip3 install gradio
+cd APE/demo
+python3 app.py
+```
+If you have GPUs, this demo will detect them and use one GPU.
+
+
+## :books: Data Prepare
+Following [here](https://github.com/shenyunhang/APE/blob/main/datasets/README.md) to prepare the following datasets:
+
+|       |   COCO  |   LVIS  | Objects365 | Openimages | VisualGenome |  SA-1B  | RefCOCO |   GQA   | PhraseCut | Flickr30k |  ODinW  |  SegInW | Roboflow100 |  ADE20k | ADE-full |  BDD10k | Cityscapes |  PC459  |   PC59  |   VOC   |    D3   |
+|:-----:|:-------:|:-------:|:----------:|:----------:|:------------:|:-------:|:-------:|:-------:|:---------:|:---------:|:-------:|:-------:|:-----------:|:-------:|:--------:|:-------:|:----------:|:-------:|:-------:|:-------:|:-------:|
+| Train | &check; | &check; |   &check;  |   &check;  |    &check;   | &check; | &check; | &check; |  &check;  |  &check;  | &cross; | &cross; |   &cross;   | &cross; |  &cross; | &cross; |   &cross;  | &cross; | &cross; | &cross; | &cross; |
+|  Test | &check; | &check; |   &check;  |   &check;  |    &cross;   | &cross; | &check; | &cross; |  &cross;  |  &cross;  | &check; | &check; |   &check;   | &check; |  &check; | &check; |   &check;  | &check; | &check; | &check; | &check; |
+
+
+## :test_tube: Inference
+
+### Infer on 160+ dataset
+We provide several scripts to evaluate all models.
+
+It is necessary to adjust the checkpoint location and GPU number in the scripts before running them.
+
+```bash
+scripts/eval_all_D.sh
+scripts/eval_all_C.sh
+scripts/eval_all_B.sh
+scripts/eval_all_A.sh
+```
+
+### Infer on images or videos
+
+APE-D
+```
+python3.9 demo/demo_lazy.py \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py \
+--input image1.jpg image2.jpg image3.jpg \
+--output /path/to/output/dir \
+--confidence-threshold 0.1 \
+--text-prompt 'person,car,chess piece of horse head' \
+--with-box \
+--with-mask \
+--with-sseg \
+--opts \
+train.init_checkpoint=/path/to/APE-D/checkpoint \
+model.model_vision.select_box_nums_for_evaluation=500 \
+model.model_vision.text_feature_bank_reset=True \
+```
+
+
+## :train: Training
+
+### Prepare backbone and language models
+```bash
+git lfs install
+git clone https://huggingface.co/QuanSun/EVA-CLIP models/QuanSun/EVA-CLIP/
+git clone https://huggingface.co/BAAI/EVA models/BAAI/EVA/
+git clone https://huggingface.co/Yuxin-CV/EVA-02 models/Yuxin-CV/EVA-02/
+```
+
+Resize patch size:
+```bash
+python3.9 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt --output models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14to16_plus_s9B.pt --image_size 224
+python3.9 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA01_CLIP_g_14_plus_psz14_s11B.pt --output models/QuanSun/EVA-CLIP/EVA01_CLIP_g_14_plus_psz14to16_s11B.pt --image_size 224
+python3.9 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA02_CLIP_L_336_psz14_s6B.pt --output models/QuanSun/EVA-CLIP/EVA02_CLIP_L_336_psz14to16_s6B.pt --image_size 336
+```
+
+### Train APE-D
+
+Single node:
+```bash
+python3.9 tools/train_net.py \
+--num-gpus 8 \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H%M%S'`
+```
+
+Multiple nodes:
+```bash
+python3.9 tools/train_net.py \
+--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
+--num-gpus ${HOST_GPU_NUM} \
+--num-machines ${HOST_NUM} \
+--machine-rank ${INDEX} \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H'`0000
+```
+
+### Train APE-C
+
+Single node:
+```bash
+python3.9 tools/train_net.py \
+--num-gpus 8 \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H%M%S'`
+```
+
+Multiple nodes:
+```bash
+python3.9 tools/train_net.py \
+--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
+--num-gpus ${HOST_GPU_NUM} \
+--num-machines ${HOST_NUM} \
+--machine-rank ${INDEX} \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H'`0000
+```
+
+### Train APE-B
+
+Single node:
+```bash
+python3.9 tools/train_net.py \
+--num-gpus 8 \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H%M%S'`
+```
+
+Multiple nodes:
+```bash
+python3.9 tools/train_net.py \
+--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
+--num-gpus ${HOST_GPU_NUM} \
+--num-machines ${HOST_NUM} \
+--machine-rank ${INDEX} \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H'`0000
+```
+
+### Train APE-A
+
+Single node:
+```bash
+python3.9 tools/train_net.py \
+--num-gpus 8 \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k_`date +'%Y%m%d_%H%M%S'`
+```
+
+Multiple nodes:
+```bash
+python3.9 tools/train_net.py \
+--dist-url="tcp://${MASTER_IP}:${MASTER_PORT}" \
+--num-gpus ${HOST_GPU_NUM} \
+--num-machines ${HOST_NUM} \
+--machine-rank ${INDEX} \
+--resume \
+--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py \
+train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k_`date +'%Y%m%d_%H'`0000
+```
+
+
+
+## :luggage: Checkpoints
+
+```
+git lfs install
+git clone https://huggingface.co/shenyunhang/APE
+```
+
+<!-- insert a table -->
+<table>
+  <thead>
+    <tr style="text-align: right;">
+      <th></th>
+      <th>name</th>
+      <th>Checkpoint</th>
+      <th>Config</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <th>1</th>
+      <td>APE-A</td>
+      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj_cp_720k_20230504_002019/model_final.pth">HF link</a></td>
+      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py">link</a></td>
+    </tr>
+    <tr>
+      <th>2</th>
+      <td>APE-B</td>
+      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj_cp_1080k_20230702_225418/model_final.pth">HF link</a> 
+      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py">link</a></td>
+    </tr>
+    <tr>
+      <th>3</th>
+      <td>APE-C</td>
+      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj_cp_1080k_20230702_210950/model_final.pth">HF link</a> 
+      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py">link</a></td>
+    </tr>
+    <tr>
+      <th>4</th>
+      <td>APE-D</td>
+      <td><a href="https://huggingface.co/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_20230829_162438/model_final.pth">HF link</a> 
+      <td><a href="https://github.com/shenyunhang/APE/blob/main/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py">link</a></td>
+    </tr>
+  </tbody>
+</table>
+
+
+## :medal_military: Results
+
+<img src=".asset/radar.png" alt="radar" width="100%">
+
+
+## :black_nib: Citation
+
+If you find our work helpful for your research, please consider citing the following BibTeX entry.   
+
+```bibtex
+@article{shen2023aligning,
+  title={Aligning and Prompting Everything All at Once for Universal Visual Perception},
+  author={Yunhang Shen and Chaoyou Fu and Peixian Chen and Mengdan Zhang and Ke Li and Xing Sun and Yunsheng Wu and Shaohui Lin and Rongrong Ji},
+  journal={arXiv preprint arXiv:2312.02153},
+  year={2023}
+}
+```
diff --git a/ape/__init__.py b/ape/__init__.py
@@ -0,0 +1,5 @@
+from .data import *
+
+# This line will be programatically read/write by setup.py.
+# Leave them at the bottom of this file and don't touch them.
+__version__ = "0.0"
diff --git a/ape/checkpoint/__init__.py b/ape/checkpoint/__init__.py
@@ -0,0 +1,6 @@
+# -*- coding: utf-8 -*-
+
+
+from .detection_checkpoint import DetectionCheckpointer
+
+__all__ = ["DetectionCheckpointer"]
diff --git a/ape/checkpoint/detection_checkpoint.py b/ape/checkpoint/detection_checkpoint.py
@@ -0,0 +1,45 @@
+# Copyright (c) Facebook, Inc. and its affiliates.
+import logging
+import os
+import pickle
+from collections import defaultdict
+from typing import IO, Any, Dict, Iterable, List, NamedTuple, Optional, Tuple, cast
+
+import numpy as np
+import torch
+
+from detectron2.checkpoint import DetectionCheckpointer as DetectionCheckpointer_d2
+
+
+class DetectionCheckpointer(DetectionCheckpointer_d2):
+
+    # def __init__(self, skip_key="", **kwargs):
+    #     super().__init__(**kwargs)
+    #     self.skip_key = skip_key
+
+    def _convert_ndarray_to_tensor(self, state_dict: Dict[str, Any]) -> None:
+        """
+        In-place convert all numpy arrays in the state_dict to torch tensor.
+        Args:
+            state_dict (dict): a state-dict to be loaded to the model.
+                Will be modified.
+        """
+        logger = logging.getLogger(__name__)
+        # model could be an OrderedDict with _metadata attribute
+        # (as returned by Pytorch's state_dict()). We should preserve these
+        # properties.
+        for k in list(state_dict.keys()):
+
+            # if self.skip_key in k:
+            # if "model_language" in k:
+            #     state_dict.pop(k)
+            #     continue
+
+            v = state_dict[k]
+            if not isinstance(v, np.ndarray) and not isinstance(v, torch.Tensor):
+                logger.warning("Unsupported type found in checkpoint! {}: {}".format(k, type(v)))
+                state_dict.pop(k)
+                continue
+                raise ValueError("Unsupported type found in checkpoint! {}: {}".format(k, type(v)))
+            if not isinstance(v, torch.Tensor):
+                state_dict[k] = torch.from_numpy(v)