Initial commit

yuan1615 · Jan 17, 2023 · 2655b8a · 2655b8a
commit 2655b8a
Show file tree

Hide file tree

Showing 66 changed files with 268,795 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,44 @@
+# LuckyVoice
+Inspired by the host competition in 2019, this repository tries to use Zou Yun's voice to build a high-expressive speech synthesis system.
+The pinyin of 邹韵 is Zōu yùn, which is a homonym for good luck. 
+
+[HuggingFace🤗 Demo-Baker](https://huggingface.co/spaces/yuan1615/EmpathyTTS) | [HuggingFace🤗 Demo-Lucky | WIP](https://huggingface.co/spaces/EmpathyTTS)
+
+
+## 1. Data Collection and Processing
+### 1.1 Collect related videos of Zou Yun
+```
+1. Use the 'you-get' tool to download videos in batches, and the video address is in dataprocessing/collectvideos/main.py.
+2. Use a format converter to convert video to wav files.
+```
+### 1.2 Split the audio using the [vad](https://github.com/snakers4/silero-vad) method.
+```
+python dataprocessing/vad/main.py --pth [downloaded video] --savepth [Save address of split audio]
+```
+
+### 1.3 Noise reduction using [speech enhancement model](https://www.modelscope.cn/models/damo/speech_frcrn_ans_cirm_16k/summary).
+[pre-trained model](https://drive.google.com/file/d/1T0fm9GA_0PIg8QOchpnHcdG9Kvp_X0ZN/view?usp=sharing)
+```
+sudo docker build -t se .
+sudo docker run -it --rm -v /home/admin/yuanxin:/se se
+python dataprocessing/se/main.py
+```
+### 1.4 Classify audio using a [voiceprint recognition model](https://github.com/wenet-e2e/wespeaker).
+
+
+### 1.5 Processing text with a [speech recognition](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) and [speech synthesis front-end](https://www.modelscope.cn/models/damo/speech_sambert-hifigan_tts_zhitian_emo_zh-cn_16k/summary)
+[speech synthesis front-end](https://drive.google.com/file/d/1jAfnclbgAkUXXKjWgBic2dmdPJECQgzm/view?usp=sharing)
+
+## 2. Baseline Model
+
+### 2.1 [VITS](https://github.com/jaywalnut310/vits) model with prosodic representation
+[pretrained_baker.pth](https://drive.google.com/file/d/13IJf70A5UjvTfJBMowVGjXLTpERaYZnV/view?usp=sharing)
+```
+python model/vits/main.py --text ['你好'] --out [The address to save the file]
+```
+
+### 2.2 [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger) model with prosodic representation
+
+
+## 3. EmpathyTTS
+
diff --git a/dataprocessing/collectvideos/main.py b/dataprocessing/collectvideos/main.py
@@ -0,0 +1,26 @@
+# 利用 you-get 批量下载视频
+# pip install you-get
+
+## B站视频-来自主持人大赛
+# you-get -o ~/Videos  'https://www.bilibili.com/video/BV16J411n7pr?p=1&vd_source=27036885f03e58efaf94bcc2b83eee66' --playlist
+
+## 央视频-来自高端访谈（英文语料）
+# https://tv.cctv.com/2023/01/06/VIDETvjqiim79x1j9qHkkYDy230106.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.21
+# https://tv.cctv.com/2022/12/23/VIDEU8qM9KeOUBKy5cteqatf221223.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.25
+# https://tv.cctv.com/2022/12/16/VIDEqyrLr7GrwnJ14KOT5WPz221216.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.73
+# https://tv.cctv.com/2022/12/09/VIDEE0uG4MQm0GNSSd48AhnT221209.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.76
+# https://tv.cctv.com/2022/11/25/VIDEsxUkOGDlKWm9UOXoJiec221125.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.81
+# https://tv.cctv.com/2022/11/18/VIDEDW0aouwBbOEj6K7r9e2Q221118.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.85
+# https://tv.cctv.com/2022/11/11/VIDEW7hEI6Qe5sicmZ92L5Y2221111.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.89
+# https://tv.cctv.com/2022/11/05/VIDEzofTInbAFVWFtdEo914e221105.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.93
+# https://tv.cctv.com/2022/10/28/VIDEI8eGDGNuILT2tYT0XFl3221028.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.97
+# https://tv.cctv.com/2022/10/21/VIDEpG9xRTu3Jd1OJuFhfRFW221021.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.101
+# https://tv.cctv.com/2022/10/14/VIDE6byRmQK62VUd8uJm1d0n221014.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.105
+
+## 环球视线
+# https://tv.cctv.com/2021/02/02/VIDEoOaXogqDjYzaSlpuiUJb210202.shtml?spm=C45404.PYmcE9NtJ5Mr.EbXlq1ofpYTu.44
+
+
+
+
+
diff --git a/dataprocessing/se/Dockerfile b/dataprocessing/se/Dockerfile
@@ -0,0 +1 @@
+FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-0.3.6
diff --git a/dataprocessing/se/main.py b/dataprocessing/se/main.py
@@ -0,0 +1,27 @@
+import os
+import argparse
+from modelscope.pipelines import pipeline
+from modelscope.utils.constant import Tasks
+
+ans = pipeline(
+    Tasks.acoustic_noise_suppression,
+    model='./ckpt/damo/speech_frcrn_ans_cirm_16k')
+
+
+def main(pth, save_pth):
+    files = os.listdir(pth)
+    os.makedirs(save_pth, exist_ok=True)
+    for f in files:
+        if '.wav' in f:
+            result = ans(
+                os.path.join(pth, f),
+                output_path=os.path.join(save_pth, 'clear_' + f.split('.')[0] + '.wav'))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--pth', default='../LuckyData/bilibili_vad')
+    parser.add_argument('--save_pth', default='../LuckyData/bilibili_vad_clear')
+    a = parser.parse_args()
+    main(a.pth, a.save_pth)
+
diff --git a/dataprocessing/vad/main.py b/dataprocessing/vad/main.py
@@ -0,0 +1,60 @@
+import os
+import torch
+from pprint import pprint
+import numpy as np
+import sys
+from scipy.io import wavfile
+import argparse
+from tqdm import tqdm
+
+sys.path.append('./dataprocessing/vad')
+from silerovad.utils_vad import *
+
+SAMPLING_RATE = 16000
+torch.set_num_threads(1)
+
+model, utils = torch.hub.load(repo_or_dir='./dataprocessing/vad/silerovad',
+                              source='local',
+                              model='silero_vad',
+                              force_reload=True,
+                              onnx=False)
+
+(get_speech_timestamps,
+ save_audio,
+ read_audio,
+ VADIterator,
+ collect_chunks) = utils
+
+
+def main(pth, save_pth):
+    os.makedirs(save_pth, exist_ok=True)
+    names = os.listdir(pth)
+    for name in tqdm(names):
+        if '.wav' in name:
+            wav = read_audio(os.path.join(pth, name), sampling_rate=SAMPLING_RATE)
+            wav_np = wav.numpy()
+            # get speech timestamps from full audio file
+            speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE, threshold=0.8)
+            # 保存切分的片段
+            i = 1
+            for d in speech_timestamps:
+                start = d['start']
+                end = d['end']
+                if (end - start)/SAMPLING_RATE < 3.0:
+                    continue
+                wav_np_temp = wav_np[start:end] * 32767.0
+                wavfile.write(
+                    os.path.join(save_pth, name.split('.')[0] + '_cut_' + str(i) + '.wav'),
+                    SAMPLING_RATE,
+                    wav_np_temp.astype(np.int16),
+                )
+                i += 1
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--pth', default='/home/admin/yuanxin/LuckyData/bilibili')
+    parser.add_argument('--save_pth', default='/home/admin/yuanxin/LuckyData/bilibili_vad')
+    a = parser.parse_args()
+    main(a.pth, a.save_pth)
+
diff --git a/dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/bug_report.md b/dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,52 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: Bug report - [X]
+labels: bug
+assignees: snakers4
+
+---
+
+## 🐛 Bug
+
+<!-- A clear and concise description of what the bug is. -->
+
+## To Reproduce
+
+Steps to reproduce the behavior:
+
+1.
+2.
+3.
+
+<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
+
+## Expected behavior
+
+<!-- A clear and concise description of what you expected to happen. -->
+
+## Environment
+
+Please copy and paste the output from this 
+[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
+(or fill out the checklist below manually).
+
+You can get the script and run it with:
+```
+wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
+# For security purposes, please check the contents of collect_env.py before running it.
+python collect_env.py
+```
+
+ - PyTorch Version (e.g., 1.0):
+ - OS (e.g., Linux):
+ - How you installed PyTorch (`conda`, `pip`, source):
+ - Build command you used (if compiling from source):
+ - Python version:
+ - CUDA/cuDNN version:
+ - GPU models and configuration:
+ - Any other relevant information:
+
+## Additional context
+
+<!-- Add any other context about the problem here. -->
diff --git a/dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/feature_request.md b/dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,27 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: Feature request - [X]
+labels: enhancement
+assignees: snakers4
+
+---
+
+## 🚀 Feature
+<!-- A clear and concise description of the feature proposal -->
+
+## Motivation
+
+<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
+
+## Pitch
+
+<!-- A clear and concise description of what you want to happen. -->
+
+## Alternatives
+
+<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
+
+## Additional context
+
+<!-- Add any other context or screenshots about the feature request here. -->
diff --git a/dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/questions---help---support.md b/dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/questions---help---support.md
@@ -0,0 +1,12 @@
+---
+name: Questions / Help / Support
+about: Ask for help, support or ask a question
+title: "❓ Questions / Help / Support"
+labels: help wanted
+assignees: snakers4
+
+---
+
+## ❓ Questions and Help
+
+We have a [wiki](https://github.com/snakers4/silero-models/wiki) available for our users. Please make sure you have checked it out first.
diff --git a/dataprocessing/vad/silerovad/CODE_OF_CONDUCT.md b/dataprocessing/vad/silerovad/CODE_OF_CONDUCT.md
@@ -0,0 +1,76 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+ advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+ address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+ professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at [email protected]. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
diff --git a/dataprocessing/vad/silerovad/LICENSE b/dataprocessing/vad/silerovad/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020-present Silero Team
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-0.3.6