-
Notifications
You must be signed in to change notification settings - Fork 178
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: xuzhang <[email protected]>
- Loading branch information
Showing
11 changed files
with
192 additions
and
227 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,94 +21,29 @@ These design features make PP-VCtrl suitable for a wide range of video generatio | |
- [ ] PP-VCtrl v2 model weights | ||
|
||
## 📷 Quick Demos | ||
### Wonderful Demos Generated by PP-VCtrl-I2V | ||
First, extract the video control sequences (edges, masks, and poses) from the source video. Then, use ControlNet to regenerate the first frame of the video. Input the video control sequences and the newly generated first frame into PP-VCtrl-I2V to generate the new video. | ||
|
||
### 1. PP-VCtrl-I2V-Canny | ||
| Input Video | Control Video | Reference Image | Output Video | | ||
|---------------------------|-----------------------------|-----------------------|--------------------------| | ||
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case1_sub1.gif" > </img>| | ||
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/canny/canny_case2_sub1.gif" > </img>| | ||
|
||
|
||
|
||
### 2. PP-VCtrl-I2V-Mask | ||
| Input Video | Control Video | Reference Image | Output Video | | ||
|---------------------------|-----------------------------|---------------------------|---------------------------| | ||
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case1_sub1.gif" > </img>| | ||
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_sub2.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/mask/mask_case2_sub2.gif" > </img>| | ||
|
||
### 3.PP-VCtrl-I2V-Pose | ||
| Input Video | Control Video | Reference Image | Output Video | | ||
|----------------------|-----------------------|----------------------|-----------------------| | ||
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case1_sub1.gif" > </img>| | ||
<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_pixel.gif" >|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_guide.gif"> </img>|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_sub1.jpg">|<img src="https://raw.githubusercontent.com/Hammingbo/Hammingbo.github.io/refs/heads/main/static/gif/pose/pose_case2_sub1.gif" > </img>| | ||
|
||
### 1. PP-VCtrl with Canny Edge : | ||
|
||
<table class="center"> | ||
<thead> | ||
<tr> | ||
<th>Prompt</th> <!-- 新增的列标题,在最左边 --> | ||
<th>Reference Image</th> | ||
<th>Control Videos</th> | ||
<th>Ours (PP-VCtrl-5B-T2V)</th> | ||
<th>Ours (PP-VCtrl-5B-I2V)</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td>Group of fishes swimming in aquarium.</td> <!-- 新增的文本描述,在最左边 --> | ||
<td><img src="assets/figures/canny_case1_reference.jpg" alt="Reference " width="160"></td> | ||
<td><img src="assets/figures/canny_case1_control_image.gif" alt="Conrotl Videos" width="160"></td> | ||
<td><img src="assets/figures/canny_case1_ours_t2v.gif" alt="Ours T2V" width="160"></td> | ||
<td><img src="assets/figures/canny_case1_ours_i2v.gif" alt="Ours I2V" width="160"></td> | ||
</tr> | ||
<tr> | ||
<td>A boat with a flag on it is sailing on the sea.</td> <!-- 第二行的文本描述 --> | ||
<td><img src="assets/figures/canny_case2_reference.jpg" alt="Reference" width="160"></td> | ||
<td><img src="assets/figures/canny_case2_control_image.gif" alt="Control Videos" width="160"></td> | ||
<td><img src="assets/figures/canny_case2_ours_t2v.gif" alt="Ours T2v" width="160"></td> | ||
<td><img src="assets/figures/canny_case2_ours_i2v.gif" alt="Ours I2v" width="160"></td> | ||
</tr> | ||
<!-- 可以继续添加更多行 --> | ||
</tbody> | ||
</table> | ||
|
||
### 2. PP-VCtrl with Mask Map : | ||
<table class="center"> | ||
<thead> | ||
<tr> | ||
<th>Prompt</th> <!-- 新增的列标题,在最左边 --> | ||
<th>Reference Image</th> | ||
<th>Control Videos</th> | ||
<th>Ours (PP-VCtrl-5B-T2V)</th> | ||
<th>Ours (PP-VCtrl-5B-I2V)</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td>A rider in a dark helmet and white breeches is atop a chestnut horse...</td> <!-- 新增的文本描述,在最左边 --> | ||
<td><img src="assets/figures/mask_case1_reference.jpg" alt="Reference " width="160"></td> | ||
<td><img src="assets/figures/mask_case1_control_image.gif" alt="Conrotl Videos" width="160"></td> | ||
<td><img src="assets/figures/mask_case1_ours_t2v.gif" alt="Ours T2V" width="160"></td> | ||
<td><img src="assets/figures/mask_case1_ours_i2v.gif" alt="Ours I2V" width="160"></td> | ||
</tr> | ||
<tr> | ||
<td>A dark gray Mini Cooper is parked on a city street...</td> <!-- 第二行的文本描述 --> | ||
<td><img src="assets/figures/mask_case2_reference.jpg" alt="Reference" width="160"></td> | ||
<td><img src="assets/figures/mask_case2_control_image.gif" alt="Control Videos" width="160"></td> | ||
<td><img src="assets/figures/mask_case2_ours_t2v.gif" alt="Ours T2v" width="160"></td> | ||
<td><img src="assets/figures/mask_case2_ours_i2v.gif" alt="Ours I2v" width="160"></td> | ||
</tr> | ||
<!-- 可以继续添加更多行 --> | ||
</tbody> | ||
</table> | ||
|
||
### 3. PP-VCtrl with Human Pose Map: | ||
<table class="center"> | ||
<thead> | ||
<tr> | ||
<th>Prompt</th> <!-- 新增的列标题,在最左边 --> | ||
<th>Reference Image</th> <!-- 新增的列标题,在最左边 --> | ||
<th>Pose Videos</th> | ||
<th>Ours (PP-VCtrl-5B-I2V)</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td>A young man with curly hair and a red t-shirt featuring a white logo is seen in various states of motion... </td> | ||
<td><img src="assets/figures/pose_case1_reference1.jpg" alt="Reference 1" width="160"></td> | ||
<td><img src="assets/figures/pose_case1_control_image.gif" alt="Pose Videos" width="160"></td> | ||
<td><img src="assets/figures/pose_case1_ours_1.gif" alt="Ours 1" width="160"></td> | ||
</tr> | ||
<tr> | ||
<td>A woman models an Adrianna Papell women's gown, featuring a sleeveless...</td> | ||
<td><img src="assets/figures/pose_case2_reference2.jpg" alt="Reference 1" width="160"></td> | ||
<td><img src="assets/figures/pose_case2_control_image.gif" alt="Pose Videos" width="160"></td> | ||
<td><img src="assets/figures/pose_case2_ours_2.gif" alt="Ours 1" width="160"></td> | ||
</tr> | ||
<!-- 可以继续添加更多行 --> | ||
</tbody> | ||
</table> | ||
|
||
## 🚀 Quick Start | ||
***Note:*** | ||
|
@@ -220,8 +155,8 @@ bash anchor/extract_canny.sh | |
|
||
```bash | ||
#download sam2 | ||
mkdir -p anchor/checkpoint/mask | ||
wget -P anchor/checkpoint/mask https://bj.bcebos.com/v1/paddlenlp/models/community/Sam/Sam2/sam2.1_hiera_large.pdparams | ||
mkdir -p anchor/checkpoints/mask | ||
wget -P anchor/checkpoints/mask https://bj.bcebos.com/v1/paddlenlp/models/community/Sam/Sam2/sam2.1_hiera_large.pdparams | ||
#mask | ||
bash anchor/extract_mask.sh | ||
``` | ||
|
@@ -268,22 +203,18 @@ The final inference results of the model can be found in the **/infer_outputs** | |
### 1. Generate with Canny Map | ||
```bash | ||
##i2v | ||
mkdir -p infer_outputs/canny/i2v | ||
bash scripts/infer_cogvideox_i2v_canny_vctrl.sh | ||
|
||
##t2v | ||
mkdir -p infer_outputs/canny/t2v | ||
bash scripts/infer_cogvideox_t2v_canny_vctrl.sh | ||
``` | ||
|
||
### 2. Generate with Mask Map | ||
```bash | ||
##i2v | ||
mkdir -p infer_outputs/mask/i2v | ||
bash scripts/infer_cogvideox_i2v_mask_vctrl.sh | ||
|
||
##t2v | ||
mkdir -p infer_outputs/mask/t2v | ||
bash scripts/infer_cogvideox_t2v_mask_vctrl.sh | ||
``` | ||
**Note**: The edge and mask control models can support both t2v (text-to-video) and i2v (image-to-video) models simultaneously. | ||
|
@@ -292,7 +223,6 @@ bash scripts/infer_cogvideox_t2v_mask_vctrl.sh | |
|
||
```bash | ||
##i2v | ||
mkdir -p infer_outputs/pose/i2v | ||
bash scripts/infer_cogvideox_i2v_pose_vctrl.sh | ||
``` | ||
|
||
|
@@ -347,4 +277,28 @@ These strategies are integrated into the unified video generation control framew | |
In the quantitative evaluation of edge control video generation (Canny), human pose control video generation (Pose), and mask control video generation (Mask) tasks, the PPVCtrl model excels or surpasses existing open-source task-specific methods in both control ability and video quality metrics. | ||
<img src="assets/models/eval1.png" style="width:100%"> | ||
|
||
We conducted manual evaluation experiments, inviting multiple evaluators to score videos generated by different methods. The | ||
We conducted manual evaluation experiments, inviting multiple evaluators to score videos generated by different methods. The evaluation dimensions included overall video quality, temporal consistency, and more. The results showed that PPVCtrl outperformed existing open-source methods in all evaluation dimensions. | ||
<img src="assets/models/eval2.png" style="width:100%"> | ||
|
||
<!-- | ||
## More version | ||
<details close> | ||
<summary>Model Versions</summary> | ||
</details> | ||
--> | ||
<!-- | ||
## Contact us | ||
Users: [[email protected]]([email protected]) | ||
--> | ||
<!-- | ||
## BibTex | ||
``` | ||
@article{guo2023animatediff, | ||
title={AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning}, | ||
author={Guo, Yuwei and Yang, Ceyuan and Rao, Anyi and Liang, Zhengyang and Wang, Yaohui and Qiao, Yu and Agrawala, Maneesh and Lin, Dahua and Dai, Bo}, | ||
journal={International Conference on Learning Representations}, | ||
year={2025} | ||
} | ||
```上面的代码打印了一条消息 --> |
Oops, something went wrong.