From d108df6e9036dd8182395e8182d4b1eb1d1a0d2c Mon Sep 17 00:00:00 2001 From: hyunkoome Date: Wed, 18 Dec 2024 21:07:13 +0900 Subject: [PATCH] updated all --- README.MD | 2 + Setting_Cuda12.1_Py3.9.md | 184 +++++++++++++++++++++++ doc/setting.md | 49 ------ env | 10 ++ requirements/py39cu12_1/requirements.txt | 2 +- 5 files changed, 197 insertions(+), 50 deletions(-) create mode 100644 Setting_Cuda12.1_Py3.9.md delete mode 100644 doc/setting.md create mode 100644 env diff --git a/README.MD b/README.MD index fbc36b6c..f5796086 100644 --- a/README.MD +++ b/README.MD @@ -19,6 +19,8 @@ This repository contains the implementation of the paper > 1CUHK 2HKUST 3Huawei Noah's Ark Lab
> \*Equal Contribution ^Corresponding Authors +## Follow Setting by Hyunkoo Kim: [for CUDA 12.1 and Python 3.9](./Setting_Cuda12.1_Py3.9.md) + ## Abstract
diff --git a/Setting_Cuda12.1_Py3.9.md b/Setting_Cuda12.1_Py3.9.md new file mode 100644 index 00000000..10cbc768 --- /dev/null +++ b/Setting_Cuda12.1_Py3.9.md @@ -0,0 +1,184 @@ +## 1. Create Conda Env + +```shell +conda create -n mdrive39 python==3.9 -y +conda activate mdrive39 +``` + +## 2. Install Python Packages +Download [mmcv-full==1.7.2](https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html) (mmcv_full-1.7.2-cp39-cp39-manylinux1_x86_64.whl) +And install +```shell +conda activate mdrive39 +python -m pip install mmcv_full-1.7.2-cp39-cp39-manylinux1_x86_64.whl +``` +```shell +pip install -r requirements/py39cu12_1/requirements.txt + +cd third_party/diffusers +pip install . + +cd third_party/bevfusion +python setup.py develop +``` + +### When install bevfusion: +#### [Error] nvcc fatal : Unsupported gpu architecture 'compute_80' +- Now, the lastest bevfusion, even can be installed cuda12.1. + +```python +if (torch.cuda.is_available() and torch.version.cuda is not None) or os.getenv("FORCE_CUDA", "0") == "1": + define_macros += [("WITH_CUDA", None)] + extension = CUDAExtension + extra_compile_args["nvcc"] = extra_args + [ + "-D__CUDA_NO_HALF_OPERATORS__", + "-D__CUDA_NO_HALF_CONVERSIONS__", + "-D__CUDA_NO_HALF2_OPERATORS__", + "-gencode=arch=compute_70,code=sm_70", + "-gencode=arch=compute_75,code=sm_75", + "-gencode=arch=compute_80,code=sm_80", # A100 + "-gencode=arch=compute_86,code=sm_86", + "-gencode=arch=compute_86,code=sm_89", # RTX4090 + ] + sources += sources_cuda +``` + +## 3. Prepare Datasets + +We prepare the nuScenes dataset similar to [bevfusion's instructions](https://github.com/mit-han-lab/bevfusion#data-preparation). Specifically, + +1. Download the nuScenes dataset from the [website](https://www.nuscenes.org/nuscenes) and put them in `./data/`. You should have these files: + ```bash + data/nuscenes + ├── maps + ├── mini + ├── samples + ├── sweeps + ├── v1.0-mini + └── v1.0-trainval + ``` + +> [!TIP] +> You can download the `.pkl` files from [OneDrive](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155157018_link_cuhk_edu_hk/EYF9ZkMHwVZKjrU5CUUPbfYBhC1iZMMnhE2uI2q5iCuv9w?e=QgEmcH). They should be enough for training and testing. + +2. Generate mmdet3d annotation files by: + + ```bash + python tools/create_data.py nuscenes --root-path ./data/nuscenes \ + --out-dir ./data/nuscenes_mmdet3d_2 --extra-tag nuscenes + ``` + You should have these files: + ```bash + data/nuscenes_mmdet3d_2 + ├── nuscenes_dbinfos_train.pkl (-> ${bevfusion-version}/nuscenes_dbinfos_train.pkl) + ├── nuscenes_gt_database (-> ${bevfusion-version}/nuscenes_gt_database) + ├── nuscenes_infos_train.pkl + └── nuscenes_infos_val.pkl + ``` + Note: As shown above, some files can be soft-linked with the original version from bevfusion. If some of the files is located in `data/nuscenes`, you can move them to `data/nuscenes_mmdet3d_2` manually. + + +3. (Optional) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through `tools/prepare_map_aux.py` with different configs in `configs/dataset`. For example: + ```bash + python tools/prepare_map_aux.py +process=train + python tools/prepare_map_aux.py +process=val + ``` + You will have files like `./val_tmp.h5` and `./train_tmp.h5`. You have to rename the cache files correctly after generating them. Our default is + ```bash + data/nuscenes_map_aux + ├── train_26x200x200_map_aux_full.h5 (42G) + └── val_26x200x200_map_aux_full.h5 (9G) + ``` + +4. I prefer as follows: +- download and create dataset on the common directory and linked them in each project directory. + +```shell +ln -s ~/DATA/NAS/nfsRoot/Train_Results/img2img-turbo/local_cashe/ local_cashe + +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/maps maps +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/samples samples +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/sweeps sweeps +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/v1.0-trainval v1.0-trainval +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/v1.0-mini v1.0-mini +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/panoptic panoptic +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/lidarseg lidarseg + +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/MagicDrive/data/nuscenes_map_aux nuscenes_map_aux +ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/MagicDrive/data/nuscenes_mmdet3d_2 nuscenes_mmdet3d_2 +ln -s ~/DATA/HDD8TB/Journal/MagicDrive/data/nuscenes/nuscenes_gt_database nuscenes_gt_database + +ln -s ~/DATA/NAS/nfsRoot/Train_Results/MagicDrive magicdrive-log +``` + + +## 4. Pretrained Weights + +Our training is based on [stable-diffusion-v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5). We assume you put them at `${ROOT}/pretrained/` as follows: + +```bash +{ROOT}/pretrained/stable-diffusion-v1-5/ +├── text_encoder +├── tokenizer +├── unet +├── vae +└── ... +``` + +## 5. Train the model + +Launch training with (with 2xA100 80GB): +```bash +cd MagicDrive + +accelerate launch --config_file ./configs/accelerator/accelerate_config_2gpu.yaml tools/train.py \ + +exp=224x400 runner=2gpus +``` +or +```shell +cd MagicDrive +bash scripts/train.sh +``` +During training, you can check tensorboard for the log and intermediate results. + +Besides, we provide debug config to test your environment and data loading process : +```bash +accelerate launch --config_file ./configs/accelerator/accelerate_config_2gpu.yaml tools/train.py \ + +exp=224x400 runner=debug runner.validation_before_run=true +``` +or +```shell +cd MagicDrive +bash scripts/train_debug.sh +``` +## 6. Convert Model Files +save pytorch model from accelerate checkpoint files + +```shell +accelerate launch --config_file ./configs/accelerator/accelerate_config_1gpu.yaml \ + tools/save_pytorch_model_from_accelerate_checkpoint.py \ + resume_from_checkpoint=./magicdrive-log/SDv1.5mv-rawbox_2024-12-13_21-38_224x400/checkpoint-160000 \ + +exp=224x400 runner=2gpus +``` +or +```shell +cd MagicDrive +bash scripts/save_pytorch_model_from_accelerate_checkpoint.sh +``` + +## 7. Test the model +After training, you can test your model for driving view generation through: +```bash +python tools/test.py resume_from_checkpoint=${YOUR MODEL} +# take our the 224x400 model checkpoint as an example +python tools/test.py resume_from_checkpoint=./pretrained/SDv1.5mv-rawbox_2023-09-07_18-39_224x400 +``` +or +```shell +python tools/inference_test_hkkim.py resume_from_checkpoint=./magicdrive-log/model_convert/SDv1.5mv-rawbox_2024-12-17_23-16_224x400 +``` +or +```shell +cd MagicDrive +bash scripts/inference_test_hkkim.sh +``` \ No newline at end of file diff --git a/doc/setting.md b/doc/setting.md deleted file mode 100644 index badd1902..00000000 --- a/doc/setting.md +++ /dev/null @@ -1,49 +0,0 @@ -```shell -conda create -n mdrive39 python==3.9 -y -conda activate mdrive39 -``` -pip install -r requirements/py39cu12_1/requirements.txt -git lfs install -git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5 - -cd third_party/diffusers -pip install . - -cd third_party/bevfusion_last - -Now, you should be able to run our demo. - -### Q3: [Error] nvcc fatal : Unsupported gpu architecture 'compute_80' - -- This may appear when you install bevfusion (mmdet3d) on cuda10.2. The latest version of bevfusion supports Ampere GPUs by hard-coding compile parameters, leading to error when compiled with cuda10.2. One can get rid of this error by comment these lines in `third_party/bevfusion/setup.py (L19)`. -- Now, the lastest bevfusion, even can be installed cuda12.1. -```python -if (torch.cuda.is_available() and torch.version.cuda is not None) or os.getenv("FORCE_CUDA", "0") == "1": - define_macros += [("WITH_CUDA", None)] - extension = CUDAExtension - extra_compile_args["nvcc"] = extra_args + [ - "-D__CUDA_NO_HALF_OPERATORS__", - "-D__CUDA_NO_HALF_CONVERSIONS__", - "-D__CUDA_NO_HALF2_OPERATORS__", - "-gencode=arch=compute_70,code=sm_70", - "-gencode=arch=compute_75,code=sm_75", - "-gencode=arch=compute_80,code=sm_80", # A100 - "-gencode=arch=compute_86,code=sm_86", - "-gencode=arch=compute_86,code=sm_89", # RTX4090 - ] - sources += sources_cuda -``` - -python setup.py develop - -```shell -ln -s ~/DATA/NAS/nfsRoot/Train_Results/img2img-turbo/local_cashe/ local_cashe - -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/maps maps -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/samples samples -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/sweeps sweeps -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/v1.0-trainval v1.0-trainval -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/v1.0-mini v1.0-mini -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/panoptic panoptic -ln -s ~/DATA/NAS/nfsRoot/Datasets/nuScenes_Datasets/nuScenes/Full_dataset_v1.0/Trainval/lidarseg lidarseg -``` \ No newline at end of file diff --git a/env b/env new file mode 100644 index 00000000..ee910daf --- /dev/null +++ b/env @@ -0,0 +1,10 @@ +HF_TOKEN="" +HF_HOME="~/MagicDrive/local_cashe/hg" +HF_USERNAME="" +TRANSFORMERS_CACHE="~/MagicDrive/local_cashe/transformers" +CUDA_LAUNCH_BLOCKING=1 +OPENAI_API_KEY="" +NCCL_P2P_DISABLE="1" +TORCH_DISTRIBUTED_DEBUG=DETAIL +NCCL_DEBUG=INFO +PYTHONFAULTHANDLER=1 \ No newline at end of file diff --git a/requirements/py39cu12_1/requirements.txt b/requirements/py39cu12_1/requirements.txt index d1c3b9e4..6711c30d 100644 --- a/requirements/py39cu12_1/requirements.txt +++ b/requirements/py39cu12_1/requirements.txt @@ -1,5 +1,5 @@ # PyTorch and related libraries -# python 3.7 => 3.10 +# python 3.7 => 3.9 --extra-index-url https://download.pytorch.org/whl/cu121 #--extra-index-url https://download.pytorch.org/whl/cu113