Skip to content

Commit

Permalink
add fid code (cure-lab#65)
Browse files Browse the repository at this point in the history
* add fid code

* add link

* update readme

* Update README.MD

* Update README.MD

* update readme
  • Loading branch information
flymin authored Jul 26, 2024
1 parent 84cb4e3 commit 6038a24
Show file tree
Hide file tree
Showing 8 changed files with 838 additions and 20 deletions.
75 changes: 55 additions & 20 deletions README.MD
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# MagicDrive

✨ If you want **video generation**, please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).

✨ Check out our new work [MagicDrive3D](https://github.com/flymin/MagicDrive3D) on **3D scene generation**!

✨ If you want **video generation**, please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).

[![arXiv](https://img.shields.io/badge/ArXiv-2310.02601-b31b1b.svg?style=plastic)](https://arxiv.org/abs/2310.02601) [![web](https://img.shields.io/badge/Web-MagicDrive-blue.svg?style=plastic)](https://gaoruiyuan.com/magicdrive/) [![license](https://img.shields.io/github/license/cure-lab/MagicDrive?style=plastic)](https://github.com/cure-lab/MagicDrive/blob/main/LICENSE) [![star](https://img.shields.io/github/stars/cure-lab/MagicDrive)](https://github.com/cure-lab/MagicDrive)

Videos generated by MagicDrive (click the image to see the video).

[![2_7_gen](./assets/2_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/2_7_gen.mp4)

[![3_7_gen](./assets/3_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/3_7_gen.mp4)

This repository contains the implementation of the paper

> MagicDrive: Street View Generation with Diverse 3D Geometry Control <br>
Expand All @@ -15,17 +21,22 @@ This repository contains the implementation of the paper
## Abstract

<details>
<summary><b>TL; DR</b> MagicDrive generates high-quality street-view images & videos with diverse 3D geometry control and multiview consistency, which can be used as a data engine in various perception tasks.</summary>

Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework, offering diverse 3D geometry controls including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view image & video synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.

</details>

## News

- [2024/06/07] MagicDrive can generate **60-frame** videos! We release the config: [rawbox_mv2.0t_0.4.3_60.yaml](https://github.com/cure-lab/MagicDrive/blob/video/configs/exp/rawbox_mv2.0t_0.4.3_60.yaml). Check out our demos on the [project page](https://gaoruiyuan.com/magicdrive/#long-video).
- [2024/06/07] We release **pre-trained weight** for **16-frame** video generation. [Check it out](https://github.com/cure-lab/MagicDrive/tree/video?tab=readme-ov-file#pretrained-magicdrive-t)!
- [2024/06/01] We holds the [W-CODA](https://coda-dataset.github.io/w-coda2024/index.html) workshop @ECCV2024. Challenge [track 2](https://coda-dataset.github.io/w-coda2024/track2/) will use MagicDrive as the baseline. We will release more resournce in the near future. Stay tuned!
- [2024/06/01] We hold the [W-CODA](https://coda-dataset.github.io/w-coda2024/index.html) workshop @ECCV2024. Challenge [track 2](https://coda-dataset.github.io/w-coda2024/track2/) will use MagicDrive as the baseline. We will release more resources in the near future. Stay tuned!

## Method

In MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompt, camera pose, object boxes, and road maps as conditions for generation. We also propose cross-view attention module for multiview consistency.
In MagicDrive, we employ two strategies (cross-attention and additive encoder branch) to inject text prompts, camera poses, object boxes, and road maps as conditions for generation. We also propose a cross-view attention module for multiview consistency.

![image-20231011165634648](./assets/overview.png)

Expand All @@ -35,8 +46,8 @@ In MagicDrive, we employ two strategies (cross-attention and additive encoder br
- [x] demo for base resolution (224x400)
- [x] GUI for interactive bbox editing
- [x] train and test code release
- [x] FID test code
- [ ] config and pretrained weight for high resolution
- [ ] train and test code for CVT and BEVFusion

## Getting Started

Expand Down Expand Up @@ -175,7 +186,7 @@ accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 8 tools/t
```
During training, you can check tensorboard for the log and intermediate results.

Besides, we provides debug config to test your environment and data loading process (with 2xV100):
Besides, we provide debug config to test your environment and data loading process (with 2xV100):
```bash
accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 2 tools/train.py \
+exp=224x400 runner=debug runner.validation_before_run=true
Expand All @@ -186,20 +197,54 @@ After training, you can test your model for driving view generation through:
```bash
python tools/test.py resume_from_checkpoint=${YOUR MODEL}
# take our pretrained model as an example
python tools/test.py resume_from_checkpoint=magicdrive-log/SDv1.5mv-rawbox_2023-09-07_18-39_224x400
python tools/test.py resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400
```
Please find the results in `./magicdrive-log/test/`.

## Quantitative Results
**To test FID**

Compare MagicDrive with other methods for generation quality:
First, you should generate the full validation set with
```bash
python perception/data_prepare/val_set_gen.py \
resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
task_id=224x400 fid.img_gen_dir=./tmp/224x400 +fid=data_gen +exp=224x400
# for map=zero as the null condition for CFG, add `runner.pipeline_param.use_zero_map_as_unconditional=true`
```
For this script, **multi-process / multi-node** is also available by `accelerate`. Just launch it with commands similar to that of training.

Then, test the FID score with
```bash
# we assume your torch cache dir is at "../pretrained/torch_cache/". If you want
# to use the default place, please comment the second last line in "tools/fid_score.py".
python tools/fid_score.py cfg \
resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
fid.rootb=tmp/224x400
```

Alternatively, we provide the pre-generated samples for validation set [here](https://mycuhk-my.sharepoint.com/:f:/g/personal/1155157018_link_cuhk_edu_hk/EjWsTYfC01BAl0F2NLP_bX4BqHjY-oV1VaTx4RgMzbiXWQ?e=fPfEy3).
You can put them in `./tmp` and launch the test through
```bash
python tools/fid_score.py cfg \
resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 \
fid.rootb=tmp/224x400/samples # FID=14.46065995481922
# or `fid.rootb=tmp/224x400map0/samples`, FID=16.195992872931697
```

## Quantitative Results
<details>
<summary>Compare MagicDrive with other methods for generation quality:</summary>

![main_results](./assets/main_results.png)

Training support with images generated from MagicDrive:
</details>

<details>
<summary>Training support with images generated from MagicDrive:</summary>

![trainability](./assets/trainability.png)

</details>

More results can be found in the main paper.

## Qualitative Results
Expand All @@ -208,16 +253,6 @@ More results can be found in the main paper.

![editings](./assets/editings.png)

## Video Generation

🎉 Please find the code at the [`video branch`](https://github.com/cure-lab/MagicDrive/tree/video).

Videos generated by MagicDrive (click the image to see the video).

[![2_7_gen](./assets/2_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/2_7_gen.mp4)

[![3_7_gen](./assets/3_7_gen_frame0.png)](https://gaoruiyuan.com/magicdrive/static/videos/3_7_gen.mp4)

## Cite Us

```bibtex
Expand Down
17 changes: 17 additions & 0 deletions configs/fid/data_gen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# @package _global_
defaults:
- _self_ # make sure override

fid:
resize: ${dataset.back_resize} # (h, w)
padding: ${dataset.back_pad} # left, top, right and bottom
raw_output: false
# path relative to `${ROOT}`
img_gen_dir: ???

runner:
validation_index: all
validation_times: 1

log_root_prefix: ../magicdrive-log/fid_gen
show_box: false
8 changes: 8 additions & 0 deletions configs/fid/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
device: null
num_workers: null
save_stats: false
batch_size: 512
dims: 2048
ratio: -1
roota: ./data/nuscenes/samples
rootb: ???
6 changes: 6 additions & 0 deletions configs/test_fid.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
defaults:
- config
- fid: default
- _self_

log_root_prefix: ../magicdrive-log/fid
16 changes: 16 additions & 0 deletions perception/common/ddp_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import torch
from accelerate import Accelerator


def concat_from_everyone(accelerator: Accelerator, tmp):
if not accelerator.use_distributed:
return tmp
output = [None for _ in range(accelerator.num_processes)]
torch.distributed.all_gather_object(output, tmp)
if accelerator.is_main_process:
res = []
for tmpi in output:
res += tmpi
return res
else:
return None
57 changes: 57 additions & 0 deletions perception/common/nuscenes_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import random
from collections import OrderedDict
from nuscenes.nuscenes import NuScenes


def sample_token_from_scene(ratio_or_num, nusc=None, drop_desc=None):
"""Sample keyframes from each scene according to ratio.
if ratio_or_num >= 1, treated as num;
if ratio_or_num < 1, treated as ratio;
if ratio_or_num == 0, only pick the first frame;
if ratio_or_num == -1, return None.
Args:
ratio (float): sample ratio to each scene.
Returns:
sample_flag_dict (dict): Dict[token, bool]
scene_sample_flag_dict (dict): Dict[scene_name, Dict[token, bool]]
"""
if ratio_or_num == -1 and drop_desc is None:
return None, None
if nusc is None:
nusc = NuScenes(version='v1.0-trainval',
dataroot='./data/nuscenes', verbose=True)
sample_flag_dict = {}
scene_sample_flag_dict = {}
for idx, scene in enumerate(nusc.scene):
scene_name = scene['name']
frames_len = scene['nbr_samples']
sample_token = scene['first_sample_token']
# iteratively gather sample tokens from one scene
token_in_this_scene = OrderedDict()
for fi in range(frames_len):
token_in_this_scene[sample_token] = False
sample = nusc.get('sample', sample_token)
sample_token = sample['next']
desc = scene['description']
if drop_desc is not None and drop_desc in desc.lower():
picked = [] # we pick nothing
else:
# pick tokens according to your ratio
if ratio_or_num == 0:
# if 0, only pick the first one
picked = list(token_in_this_scene.keys())[0:1]
else:
if ratio_or_num >= 1:
pick_num = int(ratio_or_num)
else:
pick_num = int(frames_len * ratio_or_num)
picked = random.sample(token_in_this_scene.keys(), pick_num)
for pick in picked:
token_in_this_scene[pick] = True
# now save data for output
token_in_this_scene = dict(token_in_this_scene)
scene_sample_flag_dict[scene_name] = token_in_this_scene
sample_flag_dict.update(token_in_this_scene)
return sample_flag_dict, scene_sample_flag_dict
Loading

0 comments on commit 6038a24

Please sign in to comment.