face swap integration

numz · Sep 14, 2023 · c63b2cd · c63b2cd
1 parent 0bd0e9a
commit c63b2cd
Show file tree

Hide file tree

Showing 9 changed files with 306 additions and 58 deletions.
diff --git a/.gitignore b/.gitignore
@@ -16,4 +16,5 @@ scripts/wav2lip/output/*.aac
 scripts/wav2lip/results/result_voice.mp4
 scripts/wav2lip/temp/*.avi
 scripts/wav2lip/temp/*.wav
-docs/*
+docs/*
+scripts/faceswap/model/inswapper_128.onnx
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ This repository contains a Wav2Lip Studio extension for Automatic1111.
 
 It's an all-in-one solution: just choose a video and a speech file (wav or mp3), and the extension will generate a lip-sync video. It improves the quality of the lip-sync videos generated by the [Wav2Lip tool](https://github.com/Rudrabha/Wav2Lip) by applying specific post-processing techniques with Stable diffusion tools.
 
-![Illustration](https://user-images.githubusercontent.com/800903/262430428-e091e6ad-b3e0-4e6e-a1e7-e9914add41e8.png)
+![Illustration](https://user-images.githubusercontent.com/800903/267808204-ae971458-9e8d-403e-9e10-9b7b7590d999.png)
 
 ## 📖 Quick Index
 * [🚀 Updates](#-updates)
@@ -30,6 +30,9 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
 * [📜 License](#-license)
 
 ## 🚀 Updates
+**2023.09.13**
+- 👪 Introduced faceswap: roop integration (See Usage section) **this feature is under experimental**.
+
 **2023.08.22**
 - 👄 Introduced [bark](https://github.com/suno-ai/bark/) (See Usage section), **this feature is under experimental**.
 
@@ -76,19 +79,23 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
 
 5. 🔥 Important: Get the weights. Download the model weights from the following locations and place them in the corresponding directories (take care about the filename, especially for s3fd)
 
-|        Model        |                                    Description                                     |                                                                        Link to the model                                                                         |                                       install folder                                       |
-|:-------------------:|:----------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|
-|       Wav2Lip       |                              Highly accurate lip-sync                              |        [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/Eb3LEzbfuKlJiR600lQWRxgBIY27JZg80f7V9jtMfbNDaQ?e=TBFBVW)         |                   extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\                   |
-|    Wav2Lip + GAN    |               Slightly inferior lip-sync, but better visual quality                |        [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW)         |                   extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\                   |
-|        s3fd         |                          Face Detection pre trained model                          |                                           [Link](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth)                                           |      extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth       |
-| landmark predicator |        Dlib 68 point face landmark prediction (click on the download icon)         |                              [Link](https://github.com/numz/wav2lip_uhq/blob/main/predicator/shape_predictor_68_face_landmarks.dat)                              | extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat |
-| landmark predicator |              Dlib 68 point face landmark prediction (alternate link)               | [Link](https://huggingface.co/spaces/asdasdasdasd/Face-forgery-detection/resolve/ccfc24642e0210d4d885bc7b3dbc9a68ed948ad6/shape_predictor_68_face_landmarks.dat) | extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat |
-| landmark predicator | Dlib 68 point face landmark prediction (alternate link click on the download icon) |                        [Link](https://github.com/italojs/facial-landmarks-recognition/blob/master/shape_predictor_68_face_landmarks.dat)                         | extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat |
+|        Model        |                                    Description                                     |                                                                            Link to the model                                                                             |                                       install folder                                       |
+|:-------------------:|:----------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|
+|       Wav2Lip       |                              Highly accurate lip-sync                              |            [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/Eb3LEzbfuKlJiR600lQWRxgBIY27JZg80f7V9jtMfbNDaQ?e=TBFBVW)             |                   extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\                   |
+|    Wav2Lip + GAN    |               Slightly inferior lip-sync, but better visual quality                |            [Link](https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW)             |                   extensions\sd-wav2lip-uhq\scripts\wav2lip\checkpoints\                   |
+|        s3fd         |                          Face Detection pre trained model                          |                                               [Link](https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth)                                               |      extensions\sd-wav2lip-uhq\scripts\wav2lip\face_detection\detection\sfd\s3fd.pth       |
+| landmark predicator |        Dlib 68 point face landmark prediction (click on the download icon)         |                                  [Link](https://github.com/numz/wav2lip_uhq/blob/main/predicator/shape_predictor_68_face_landmarks.dat)                                  | extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat |
+| landmark predicator |              Dlib 68 point face landmark prediction (alternate link)               |     [Link](https://huggingface.co/spaces/asdasdasdasd/Face-forgery-detection/resolve/ccfc24642e0210d4d885bc7b3dbc9a68ed948ad6/shape_predictor_68_face_landmarks.dat)     | extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat |
+| landmark predicator | Dlib 68 point face landmark prediction (alternate link click on the download icon) |                            [Link](https://github.com/italojs/facial-landmarks-recognition/blob/master/shape_predictor_68_face_landmarks.dat)                             | extensions\sd-wav2lip-uhq\scripts\wav2lip\predicator\shape_predictor_68_face_landmarks.dat |
+|   face swap model   |                              model used by face swap                               |[Link](https://huggingface.co/ezioruan/inswapper_128.onnx/resolve/main/inswapper_128.onnx)                                                                                |  extensions\sd-wav2lip-uhq\scripts\faceswap\model\inswapper_128.onnx   |
 
 
 ## 🐍 Usage
 1. Choose a video (avi or mp4 format) with a face in it. If there is no face in only one frame of the video, process will fail. Note avi file will not appear in Video input but process will works.
-2. Audio, 2 options:
+2. Face Swap (take times so be patient):
+   1. **Face Swap**: chose the image of the face you want to swap with the face in the video.
+   2. **Face Index**: if there are multiple faces in the image, you can choose the face you want to swap with the face in the video. 0 is the first face from left to right.
+3. Audio, 2 options:
    1. Put audio file in the "Speech" input. 
    2. Generate Audio with the text to speech [bark](https://github.com/suno-ai/bark/) integration.
       1. Choose the language : Turkish, English, Chinese, Hindi, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Polish, German, French
@@ -113,20 +120,20 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
          - ♪ for song lyrics
          - CAPITALIZATION for emphasis of a word
          - [MAN] and [WOMAN] to bias Bark toward male and female speakers, respectively
-3. choose a checkpoint (see table above).
-4. **Padding**: Wav2Lip uses this to move the mouth. This is useful if the mouth is not at the good place. Usually, default value is good, but certain video may need to be adjusted.
-5. **No Smooth**: When checked, this option retains the original mouth shape without smoothing.
-6. **Resize Factor**: This is a resize factor for the video. The default value is 1.0, but you can change it to suit your needs. This is useful if the video size is too large.
-7. **Only Mouth**: This option tracks only the mouth, removing other facial motions like those of the cheeks and chin.
-8. **Mouth Mask Dilate**: This will dilate the mouth mask to cover more area around the mouth. depends on the mouth size.
-9. **Face Mask Erode**: This will erode the face mask to remove some area around the face. depends on the face size.
-10. **Mask Blur**: This will blur the mask to make it more smooth, try to keep it under or equal to **Mouth Mask Dilate**.
-11. **Code Former Fidelity**: 
+4. choose a checkpoint (see table above).
+5. **Padding**: Wav2Lip uses this to move the mouth. This is useful if the mouth is not at the good place. Usually, default value is good, but certain video may need to be adjusted.
+6. **No Smooth**: When checked, this option retains the original mouth shape without smoothing.
+7. **Resize Factor**: This is a resize factor for the video. The default value is 1.0, but you can change it to suit your needs. This is useful if the video size is too large.
+8. **Only Mouth**: This option tracks only the mouth, removing other facial motions like those of the cheeks and chin.
+9. **Mouth Mask Dilate**: This will dilate the mouth mask to cover more area around the mouth. depends on the mouth size.
+10. **Face Mask Erode**: This will erode the face mask to remove some area around the face. depends on the face size.
+11. **Mask Blur**: This will blur the mask to make it more smooth, try to keep it under or equal to **Mouth Mask Dilate**.
+12. **Code Former Fidelity**: 
     1. A value of 0 offers higher quality but may significantly alter the person's facial appearance and cause noticeable flickering between frames.
     2. A value of 1 provides lower quality but maintains the person's face more consistently and reduces frame flickering.
     3. Using a value below 0.5 is not advised. Adjust this setting to achieve optimal results. Starting with a value of 0.75 is recommended.
-12. **Active debug**: This will create step-by-step images in the debug folder.
-13. Click on the "Generate" button.
+13. **Active debug**: This will create step-by-step images in the debug folder.
+14. Click on the "Generate" button.
 
 ## 👄 Note on the bark Fidelity
 
@@ -141,21 +148,23 @@ https://user-images.githubusercontent.com/800903/262442794-61b1e32f-3f87-4b36-98
 
 https://user-images.githubusercontent.com/800903/262449305-901086a3-22cb-42d2-b5be-a5f38db4549a.mp4
 
+https://user-images.githubusercontent.com/800903/267808494-300f8cc3-9136-4810-86e2-92f2114a5f9a.mp4
+
 ## 📖 Behind the scenes
 
 This extension operates in several stages to improve the quality of Wav2Lip-generated videos:
 
-1. **Generate a Wav2lip video**: The script first generates a low-quality Wav2Lip video using the input video and audio.
-2. **Video Quality Enhancement**: Create a high-quality video using the low-quality video by using the enhancer define by user. 
-3. **Mask Creation**: The script creates a mask around the mouth and tries to keep other facial motions like those of the cheeks and chin.
-4. **Video Generation**: The script then takes the high-quality mouth image and overlays it onto the original image guided by the mouth mask.
-5. **Video Post Processing**: The script then uses the ffmpeg tool to generate the final video.
+1. **Generate face swap video**: The script first generates the face swap video if image is in "face Swap" field, this operation take times so be patient.
+2. **Generate a Wav2lip video**: Then script generates a low-quality Wav2Lip video using the input video and audio.
+3. **Video Quality Enhancement**: Create a high-quality video using the low-quality video by using the enhancer define by user. 
+4. **Mask Creation**: The script creates a mask around the mouth and tries to keep other facial motions like those of the cheeks and chin.
+5. **Video Generation**: The script then takes the high-quality mouth image and overlays it onto the original image guided by the mouth mask.
+6. **Video Post Processing**: The script then uses the ffmpeg tool to generate the final video.
 
 ## 💪 Quality tips
 - Use a high quality video as input
 - Utilize a video with a consistent frame rate. Occasionally, videos may exhibit unusual playback frame rates (not the standard 24, 25, 30, 60), which can lead to issues with the face mask.
 - Use a high quality audio file as input, without background noise or music. Clean audio with a tool like [https://podcast.adobe.com/enhance](https://podcast.adobe.com/enhance).
-- Try to minimize the grain on the face on the input as much as possible. For example, you can use the "Restore faces" feature in img2img before using an image as input for Wav2Lip.
 - Dilate the mouth mask. This will help the model retain some facial motion and hide the original mouth.
 - Mask Blur maximum twice the value of Mouth Mask Dilate. If you want to increase the blur, increase the value of Mouth Mask Dilate otherwise the mouth will be blurred and the underlying mouth could be visible.
 - Upscaling can be good for improving result, particularly around the mouth area. However, it will extend the processing duration. Use this tutorial from Olivio Sarikas to upscale your video: [https://www.youtube.com/watch?v=3z4MKUqFEUk](https://www.youtube.com/watch?v=3z4MKUqFEUk). Ensure the denoising strength is set between 0.0 and 0.05, select the 'revAnimated' model, and use the batch mode. i'll create a tutorial for this soon.
@@ -173,6 +182,8 @@ This extension operates in several stages to improve the quality of Wav2Lip-gene
 - [ ] Tutorials
 - [ ] Convert avi to mp4. Avi is not show in video input but process work fine
 - [ ] Add Possibility to use a video for audio input
+- [ ] Standalone version
+- [ ] Comfui intergration
 
 ## 😎 Contributing
 
@@ -182,6 +193,7 @@ We welcome contributions to this project. When submitting pull requests, please
 - [Wav2Lip](https://github.com/Rudrabha/Wav2Lip)
 - [CodeFormer](https://github.com/sczhou/CodeFormer)
 - [bark](https://github.com/suno-ai/bark/)
+- [roop](https://github.com/s0md3v/sd-webui-roop)
 
 ## 📝 Citation
 If you use this project in your own work, in articles, tutorials, or presentations, we encourage you to cite this project to acknowledge the efforts put into it.

diff --git a/requirements.txt b/requirements.txt
@@ -11,4 +11,10 @@ tqdm
 numba
 imutils
 imageio_ffmpeg
-git+https://github.com/suno-ai/bark.git
+git+https://github.com/suno-ai/bark.git
+insightface==0.7.3
+onnx==1.14.0
+onnxruntime==1.15.0
+onnxruntime-gpu==1.15.0
+opencv-python>=4.8.0
+ifnude
diff --git a/scripts/faceswap/model/README.md b/scripts/faceswap/model/README.md
@@ -0,0 +1 @@
+inswapper model folder