add fid code (cure-lab#65)

* add fid code * add link * update readme * Update README.MD * Update README.MD * update readme
hyunkoome · Jul 26, 2024 · 6038a24 · 6038a24
1 parent 84cb4e3
commit 6038a24
Show file tree

Hide file tree

Showing 8 changed files with 838 additions and 20 deletions.
diff --git a/README.MD b/README.MD
@@ -1,11 +1,17 @@
 # MagicDrive
 
-✨ If you want **video generation**, please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).
-
 ✨ Check out our new work [MagicDrive3D](https://github.com/flymin/MagicDrive3D) on **3D scene generation**!
 
+✨ If you want **video generation**, please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).
+
 [![arXiv](https://img.shields.io/badge/ArXiv-2310.02601-b31b1b.svg?style=plastic)](https://arxiv.org/abs/2310.02601) [![web](https://img.shields.io/badge/Web-MagicDrive-blue.svg?style=plastic)](https://gaoruiyuan.com/magicdrive/) [![license](https://img.shields.io/github/license/cure-lab/MagicDrive?style=plastic)](https://github.com/cure-lab/MagicDrive/blob/main/LICENSE) [![star](https://img.shields.io/github/stars/cure-lab/MagicDrive)](https://github.com/cure-lab/MagicDrive)
 
+Videos generated by MagicDrive (click the image to see the video).
+
+[![2_7_gen](./assets/2_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/2_7_gen.mp4)
+
+[![3_7_gen](./assets/3_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/3_7_gen.mp4)
+
 This repository contains the implementation of the paper 
 
 > MagicDrive: Street View Generation with Diverse 3D Geometry Control <br>
@@ -15,17 +21,22 @@ This repository contains the implementation of the paper
 
 ## Abstract
 
+<details>
+<summary><b>TL; DR</b> MagicDrive generates high-quality street-view images & videos with diverse 3D geometry control and multiview consistency, which can be used as a data engine in various perception tasks.</summary>
+
 Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view image & video synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.
 
+</details>
+
 ## News
 
 - [2024/06/07] MagicDrive can generate **60-frame** videos! We release the config: [rawbox_mv2.0t_0.4.3_60.yaml](https://github.com/cure-lab/MagicDrive/blob/video/configs/exp/rawbox_mv2.0t_0.4.3_60.yaml). Check out our demos on the [project page](https://gaoruiyuan.com/magicdrive/#long-video).
 - [2024/06/07] We release **pre-trained weight** for **16-frame** video generation. [Check it out](https://github.com/cure-lab/MagicDrive/tree/video?tab=readme-ov-file#pretrained-magicdrive-t)!
-- [2024/06/01] We holds the [W-CODA](https://coda-dataset.github.io/w-coda2024/index.html) workshop @ECCV2024. Challenge [track 2](https://coda-dataset.github.io/w-coda2024/track2/) will use MagicDrive as the baseline. We will release more resournce in the near future. Stay tuned!
+- [2024/06/01] We hold the [W-CODA](https://coda-dataset.github.io/w-coda2024/index.html) workshop @ECCV2024. Challenge [track 2](https://coda-dataset.github.io/w-coda2024/track2/) will use MagicDrive as the baseline. We will release more resources in the near future. Stay tuned!
 
 ## Method
 
-In MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompt, camera pose, object boxes, and road maps as conditions for generation. We also propose cross-view attention module for multiview consistency.
+In MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompts, camera poses, object boxes, and road maps as conditions for generation. We also propose a cross-view attention module for multiview consistency.
 
 ![image-20231011165634648](./assets/overview.png)
 
@@ -35,8 +46,8 @@ In MagicDrive, we employ two strategies (cross-attention and additive encoder br
 - [x] demo for base resolution (224x400)
 - [x] GUI for interactive bbox editing
 - [x] train and test code release
+- [x] FID test code
 - [ ] config and pretrained weight for high resolution
-- [ ] train and test code for CVT and BEVFusion
 
 ## Getting Started
 
@@ -175,7 +186,7 @@ accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 8 tools/t
 ```
 During training, you can check tensorboard for the log and intermediate results.
 
-Besides, we provides debug config to test your environment and data loading process (with 2xV100):
+Besides, we provide debug config to test your environment and data loading process (with 2xV100):
 ```bash
 accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 2 tools/train.py \
   +exp=224x400 runner=debug runner.validation_before_run=true
@@ -186,20 +197,54 @@ After training, you can test your model for driving view generation through:
 ```bash
 python tools/test.py resume_from_checkpoint=${YOUR MODEL}
 # take our pretrained model as an example
-python tools/test.py resume_from_checkpoint=magicdrive-log/SDv1.5mv-rawbox_2023-09-07_18-39_224x400
+python tools/test.py resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400
 ```
 Please find the results in `./magicdrive-log/test/`.
 
-## Quantitative Results
+**To test FID**
 
-Compare MagicDrive with other methods for generation quality:
+First, you should generate the full validation set with
+```bash
+python perception/data_prepare/val_set_gen.py \
+  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
+  task_id=224x400 fid.img_gen_dir=./tmp/224x400 +fid=data_gen +exp=224x400
+  # for map=zero as the null condition for CFG, add `runner.pipeline_param.use_zero_map_as_unconditional=true`
+```
+For this script, **multi-process / multi-node** is also available by `accelerate`. Just launch it with commands similar to that of training.
+
+Then, test the FID score with
+```bash
+# we assume your torch cache dir is at "../pretrained/torch_cache/". If you want
+# to use the default place, please comment the second last line in "tools/fid_score.py".
+python tools/fid_score.py cfg \
+  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
+  fid.rootb=tmp/224x400
+```
+
+Alternatively, we provide the pre-generated samples for validation set [here](https://mycuhk-my.sharepoint.com/:f:/g/personal/1155157018_link_cuhk_edu_hk/EjWsTYfC01BAl0F2NLP_bX4BqHjY-oV1VaTx4RgMzbiXWQ?e=fPfEy3).
+You can put them in `./tmp` and launch the test through
+```bash
+python tools/fid_score.py cfg \
+  resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
+  fid.rootb=tmp/224x400/samples  # FID=14.46065995481922
+  # or `fid.rootb=tmp/224x400map0/samples`, FID=16.195992872931697
+```
+
+## Quantitative Results
+<details>
+<summary>Compare MagicDrive with other methods for generation quality:</summary>
 
 ![main_results](./assets/main_results.png)
 
-Training support with images generated from MagicDrive:
+</details>
+
+<details>
+<summary>Training support with images generated from MagicDrive:</summary>
 
 ![trainability](./assets/trainability.png)
 
+</details>
+
 More results can be found in the main paper.
 
 ## Qualitative Results
@@ -208,16 +253,6 @@ More results can be found in the main paper.
 
 ![editings](./assets/editings.png)
 
-## Video Generation
-
-🎉 Please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).
-
-Videos generated by MagicDrive (click the image to see the video).
-
-[![2_7_gen](./assets/2_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/2_7_gen.mp4)
-
-[![3_7_gen](./assets/3_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/3_7_gen.mp4)
-
 ## Cite Us
 
 ```bibtex

diff --git a/configs/fid/data_gen.yaml b/configs/fid/data_gen.yaml
@@ -0,0 +1,17 @@
+# @package _global_
+defaults:
+  - _self_  # make sure override
+
+fid:
+  resize: ${dataset.back_resize}  # (h, w)
+  padding: ${dataset.back_pad}  # left, top, right and bottom
+  raw_output: false
+  # path relative to `${ROOT}`
+  img_gen_dir: ???
+
+runner:
+  validation_index: all
+  validation_times: 1
+
+log_root_prefix: ../magicdrive-log/fid_gen
+show_box: false
diff --git a/configs/fid/default.yaml b/configs/fid/default.yaml
@@ -0,0 +1,8 @@
+device: null
+num_workers: null
+save_stats: false
+batch_size: 512
+dims: 2048
+ratio: -1
+roota: ./data/nuscenes/samples
+rootb: ???
diff --git a/configs/test_fid.yaml b/configs/test_fid.yaml
@@ -0,0 +1,6 @@
+defaults:
+  - config
+  - fid: default
+  - _self_
+
+log_root_prefix: ../magicdrive-log/fid
diff --git a/perception/common/ddp_utils.py b/perception/common/ddp_utils.py
@@ -0,0 +1,16 @@
+import torch
+from accelerate import Accelerator
+
+
+def concat_from_everyone(accelerator: Accelerator, tmp):
+    if not accelerator.use_distributed:
+        return tmp
+    output = [None for _ in range(accelerator.num_processes)]
+    torch.distributed.all_gather_object(output, tmp)
+    if accelerator.is_main_process:
+        res = []
+        for tmpi in output:
+            res += tmpi
+        return res
+    else:
+        return None
diff --git a/perception/common/nuscenes_utils.py b/perception/common/nuscenes_utils.py
@@ -0,0 +1,57 @@
+import random
+from collections import OrderedDict
+from nuscenes.nuscenes import NuScenes
+
+
+def sample_token_from_scene(ratio_or_num, nusc=None, drop_desc=None):
+    """Sample keyframes from each scene according to ratio.
+    if ratio_or_num >= 1, treated as num;
+    if ratio_or_num < 1, treated as ratio;
+    if ratio_or_num == 0, only pick the first frame;
+    if ratio_or_num == -1, return None.
+
+    Args:
+        ratio (float): sample ratio to each scene.
+
+    Returns:
+        sample_flag_dict (dict): Dict[token, bool]
+        scene_sample_flag_dict (dict): Dict[scene_name, Dict[token, bool]]
+    """
+    if ratio_or_num == -1 and drop_desc is None:
+        return None, None
+    if nusc is None:
+        nusc = NuScenes(version='v1.0-trainval',
+                        dataroot='./data/nuscenes', verbose=True)
+    sample_flag_dict = {}
+    scene_sample_flag_dict = {}
+    for idx, scene in enumerate(nusc.scene):
+        scene_name = scene['name']
+        frames_len = scene['nbr_samples']
+        sample_token = scene['first_sample_token']
+        # iteratively gather sample tokens from one scene
+        token_in_this_scene = OrderedDict()
+        for fi in range(frames_len):
+            token_in_this_scene[sample_token] = False
+            sample = nusc.get('sample', sample_token)
+            sample_token = sample['next']
+        desc = scene['description']
+        if drop_desc is not None and drop_desc in desc.lower():
+            picked = []  # we pick nothing
+        else:
+            # pick tokens according to your ratio
+            if ratio_or_num == 0:
+                # if 0, only pick the first one
+                picked = list(token_in_this_scene.keys())[0:1]
+            else:
+                if ratio_or_num >= 1:
+                    pick_num = int(ratio_or_num)
+                else:
+                    pick_num = int(frames_len * ratio_or_num)
+                picked = random.sample(token_in_this_scene.keys(), pick_num)
+        for pick in picked:
+            token_in_this_scene[pick] = True
+        # now save data for output
+        token_in_this_scene = dict(token_in_this_scene)
+        scene_sample_flag_dict[scene_name] = token_in_this_scene
+        sample_flag_dict.update(token_in_this_scene)
+    return sample_flag_dict, scene_sample_flag_dict