Skip to content

Commit

Permalink
Vctrl_add_cases (#1030)
Browse files Browse the repository at this point in the history
Co-authored-by: xuzhang <[email protected]>
  • Loading branch information
Hammingbo and westfish authored Feb 12, 2025
1 parent c126fed commit aebbbac
Show file tree
Hide file tree
Showing 11 changed files with 192 additions and 227 deletions.
144 changes: 49 additions & 95 deletions ppdiffusers/examples/ppvctrl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,94 +21,29 @@ These design features make PP-VCtrl suitable for a wide range of video generatio
- [ ] PP-VCtrl v2 model weights

## 📷 Quick Demos
### Wonderful Demos Generated by PP-VCtrl-I2V
First, extract the video control sequences (edges, masks, and poses) from the source video. Then, use ControlNet to regenerate the first frame of the video. Input the video control sequences and the newly generated first frame into PP-VCtrl-I2V to generate the new video.

### 1. PP-VCtrl-I2V-Canny
| Input Video | Control Video | Reference Image | Output Video |
|---------------------------|-----------------------------|-----------------------|--------------------------|
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_sub1.gif" > </img>|
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_sub1.gif" > </img>|



### 2. PP-VCtrl-I2V-Mask
| Input Video | Control Video | Reference Image | Output Video |
|---------------------------|-----------------------------|---------------------------|---------------------------|
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_sub1.gif" > </img>|
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_sub2.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_sub2.gif" > </img>|

### 3.PP-VCtrl-I2V-Pose
| Input Video | Control Video | Reference Image | Output Video |
|----------------------|-----------------------|----------------------|-----------------------|
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_sub1.gif" > </img>|
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_sub1.gif" > </img>|

### 1. PP-VCtrl with Canny Edge :

<table class="center">
<thead>
<tr>
<th>Prompt</th> <!-- 新增的列标题,在最左边 -->
<th>Reference Image</th>
<th>Control Videos</th>
<th>Ours (PP-VCtrl-5B-T2V)</th>
<th>Ours (PP-VCtrl-5B-I2V)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Group of fishes swimming in aquarium.</td> <!-- 新增的文本描述,在最左边 -->
<td><img src="assets/figures/canny_case1_reference.jpg" alt="Reference " width="160"></td>
<td><img src="assets/figures/canny_case1_control_image.gif" alt="Conrotl Videos" width="160"></td>
<td><img src="assets/figures/canny_case1_ours_t2v.gif" alt="Ours T2V" width="160"></td>
<td><img src="assets/figures/canny_case1_ours_i2v.gif" alt="Ours I2V" width="160"></td>
</tr>
<tr>
<td>A boat with a flag on it is sailing on the sea.</td> <!-- 第二行的文本描述 -->
<td><img src="assets/figures/canny_case2_reference.jpg" alt="Reference" width="160"></td>
<td><img src="assets/figures/canny_case2_control_image.gif" alt="Control Videos" width="160"></td>
<td><img src="assets/figures/canny_case2_ours_t2v.gif" alt="Ours T2v" width="160"></td>
<td><img src="assets/figures/canny_case2_ours_i2v.gif" alt="Ours I2v" width="160"></td>
</tr>
<!-- 可以继续添加更多行 -->
</tbody>
</table>

### 2. PP-VCtrl with Mask Map :
<table class="center">
<thead>
<tr>
<th>Prompt</th> <!-- 新增的列标题,在最左边 -->
<th>Reference Image</th>
<th>Control Videos</th>
<th>Ours (PP-VCtrl-5B-T2V)</th>
<th>Ours (PP-VCtrl-5B-I2V)</th>
</tr>
</thead>
<tbody>
<tr>
<td>A rider in a dark helmet and white breeches is atop a chestnut horse...</td> <!-- 新增的文本描述,在最左边 -->
<td><img src="assets/figures/mask_case1_reference.jpg" alt="Reference " width="160"></td>
<td><img src="assets/figures/mask_case1_control_image.gif" alt="Conrotl Videos" width="160"></td>
<td><img src="assets/figures/mask_case1_ours_t2v.gif" alt="Ours T2V" width="160"></td>
<td><img src="assets/figures/mask_case1_ours_i2v.gif" alt="Ours I2V" width="160"></td>
</tr>
<tr>
<td>A dark gray Mini Cooper is parked on a city street...</td> <!-- 第二行的文本描述 -->
<td><img src="assets/figures/mask_case2_reference.jpg" alt="Reference" width="160"></td>
<td><img src="assets/figures/mask_case2_control_image.gif" alt="Control Videos" width="160"></td>
<td><img src="assets/figures/mask_case2_ours_t2v.gif" alt="Ours T2v" width="160"></td>
<td><img src="assets/figures/mask_case2_ours_i2v.gif" alt="Ours I2v" width="160"></td>
</tr>
<!-- 可以继续添加更多行 -->
</tbody>
</table>

### 3. PP-VCtrl with Human Pose Map:
<table class="center">
<thead>
<tr>
<th>Prompt</th> <!-- 新增的列标题,在最左边 -->
<th>Reference Image</th> <!-- 新增的列标题,在最左边 -->
<th>Pose Videos</th>
<th>Ours (PP-VCtrl-5B-I2V)</th>
</tr>
</thead>
<tbody>
<tr>
<td>A young man with curly hair and a red t-shirt featuring a white logo is seen in various states of motion... </td>
<td><img src="assets/figures/pose_case1_reference1.jpg" alt="Reference 1" width="160"></td>
<td><img src="assets/figures/pose_case1_control_image.gif" alt="Pose Videos" width="160"></td>
<td><img src="assets/figures/pose_case1_ours_1.gif" alt="Ours 1" width="160"></td>
</tr>
<tr>
<td>A woman models an Adrianna Papell women's gown, featuring a sleeveless...</td>
<td><img src="assets/figures/pose_case2_reference2.jpg" alt="Reference 1" width="160"></td>
<td><img src="assets/figures/pose_case2_control_image.gif" alt="Pose Videos" width="160"></td>
<td><img src="assets/figures/pose_case2_ours_2.gif" alt="Ours 1" width="160"></td>
</tr>
<!-- 可以继续添加更多行 -->
</tbody>
</table>

## 🚀 Quick Start
***Note:***
Expand Down Expand Up @@ -220,8 +155,8 @@ bash anchor/extract_canny.sh

```bash
#download sam2
mkdir -p anchor/checkpoint/mask
wget -P anchor/checkpoint/mask https://bj.bcebos.com/v1/paddlenlp/models/community/Sam/Sam2/sam2.1_hiera_large.pdparams
mkdir -p anchor/checkpoints/mask
wget -P anchor/checkpoints/mask https://bj.bcebos.com/v1/paddlenlp/models/community/Sam/Sam2/sam2.1_hiera_large.pdparams
#mask
bash anchor/extract_mask.sh
```
Expand Down Expand Up @@ -268,22 +203,18 @@ The final inference results of the model can be found in the **/infer_outputs**
### 1. Generate with Canny Map
```bash
##i2v
mkdir -p infer_outputs/canny/i2v
bash scripts/infer_cogvideox_i2v_canny_vctrl.sh

##t2v
mkdir -p infer_outputs/canny/t2v
bash scripts/infer_cogvideox_t2v_canny_vctrl.sh
```

### 2. Generate with Mask Map
```bash
##i2v
mkdir -p infer_outputs/mask/i2v
bash scripts/infer_cogvideox_i2v_mask_vctrl.sh

##t2v
mkdir -p infer_outputs/mask/t2v
bash scripts/infer_cogvideox_t2v_mask_vctrl.sh
```
**Note**: The edge and mask control models can support both t2v (text-to-video) and i2v (image-to-video) models simultaneously.
Expand All @@ -292,7 +223,6 @@ bash scripts/infer_cogvideox_t2v_mask_vctrl.sh

```bash
##i2v
mkdir -p infer_outputs/pose/i2v
bash scripts/infer_cogvideox_i2v_pose_vctrl.sh
```

Expand Down Expand Up @@ -347,4 +277,28 @@ These strategies are integrated into the unified video generation control framew
In the quantitative evaluation of edge control video generation (Canny), human pose control video generation (Pose), and mask control video generation (Mask) tasks, the PPVCtrl model excels or surpasses existing open-source task-specific methods in both control ability and video quality metrics.
<img src="assets/models/eval1.png" style="width:100%">

We conducted manual evaluation experiments, inviting multiple evaluators to score videos generated by different methods. The
We conducted manual evaluation experiments, inviting multiple evaluators to score videos generated by different methods. The evaluation dimensions included overall video quality, temporal consistency, and more. The results showed that PPVCtrl outperformed existing open-source methods in all evaluation dimensions.
<img src="assets/models/eval2.png" style="width:100%">

<!--
## More version
<details close>
<summary>Model Versions</summary>
</details>
-->
<!--
## Contact us
Users: [[email protected]]([email protected])
-->
<!--
## BibTex
```
@article{guo2023animatediff,
title={AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning},
author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Liang, Zhengyang and Wang, Yaohui and Qiao, Yu and Agrawala, Maneesh and Lin, Dahua and Dai, Bo},
journal={International Conference on Learning Representations},
year={2025}
}
```上面的代码打印了一条消息 -->
Loading

0 comments on commit aebbbac

Please sign in to comment.