-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
xin.yuan
committed
Jan 17, 2023
0 parents
commit 2655b8a
Showing
66 changed files
with
268,795 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# LuckyVoice | ||
Inspired by the host competition in 2019, this repository tries to use Zou Yun's voice to build a high-expressive speech synthesis system. | ||
The pinyin of 邹韵 is Zōu yùn, which is a homonym for good luck. | ||
|
||
[HuggingFace🤗 Demo-Baker](https://huggingface.co/spaces/yuan1615/EmpathyTTS) | [HuggingFace🤗 Demo-Lucky | WIP](https://huggingface.co/spaces/EmpathyTTS) | ||
|
||
|
||
## 1. Data Collection and Processing | ||
### 1.1 Collect related videos of Zou Yun | ||
``` | ||
1. Use the 'you-get' tool to download videos in batches, and the video address is in dataprocessing/collectvideos/main.py. | ||
2. Use a format converter to convert video to wav files. | ||
``` | ||
### 1.2 Split the audio using the [vad](https://github.com/snakers4/silero-vad) method. | ||
``` | ||
python dataprocessing/vad/main.py --pth [downloaded video] --savepth [Save address of split audio] | ||
``` | ||
|
||
### 1.3 Noise reduction using [speech enhancement model](https://www.modelscope.cn/models/damo/speech_frcrn_ans_cirm_16k/summary). | ||
[pre-trained model](https://drive.google.com/file/d/1T0fm9GA_0PIg8QOchpnHcdG9Kvp_X0ZN/view?usp=sharing) | ||
``` | ||
sudo docker build -t se . | ||
sudo docker run -it --rm -v /home/admin/yuanxin:/se se | ||
python dataprocessing/se/main.py | ||
``` | ||
### 1.4 Classify audio using a [voiceprint recognition model](https://github.com/wenet-e2e/wespeaker). | ||
|
||
|
||
### 1.5 Processing text with a [speech recognition](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) and [speech synthesis front-end](https://www.modelscope.cn/models/damo/speech_sambert-hifigan_tts_zhitian_emo_zh-cn_16k/summary) | ||
[speech synthesis front-end](https://drive.google.com/file/d/1jAfnclbgAkUXXKjWgBic2dmdPJECQgzm/view?usp=sharing) | ||
|
||
## 2. Baseline Model | ||
|
||
### 2.1 [VITS](https://github.com/jaywalnut310/vits) model with prosodic representation | ||
[pretrained_baker.pth](https://drive.google.com/file/d/13IJf70A5UjvTfJBMowVGjXLTpERaYZnV/view?usp=sharing) | ||
``` | ||
python model/vits/main.py --text ['你好'] --out [The address to save the file] | ||
``` | ||
|
||
### 2.2 [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger) model with prosodic representation | ||
|
||
|
||
## 3. EmpathyTTS | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# 利用 you-get 批量下载视频 | ||
# pip install you-get | ||
|
||
## B站视频-来自主持人大赛 | ||
# you-get -o ~/Videos 'https://www.bilibili.com/video/BV16J411n7pr?p=1&vd_source=27036885f03e58efaf94bcc2b83eee66' --playlist | ||
|
||
## 央视频-来自高端访谈(英文语料) | ||
# https://tv.cctv.com/2023/01/06/VIDETvjqiim79x1j9qHkkYDy230106.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.21 | ||
# https://tv.cctv.com/2022/12/23/VIDEU8qM9KeOUBKy5cteqatf221223.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.25 | ||
# https://tv.cctv.com/2022/12/16/VIDEqyrLr7GrwnJ14KOT5WPz221216.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.73 | ||
# https://tv.cctv.com/2022/12/09/VIDEE0uG4MQm0GNSSd48AhnT221209.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.76 | ||
# https://tv.cctv.com/2022/11/25/VIDEsxUkOGDlKWm9UOXoJiec221125.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.81 | ||
# https://tv.cctv.com/2022/11/18/VIDEDW0aouwBbOEj6K7r9e2Q221118.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.85 | ||
# https://tv.cctv.com/2022/11/11/VIDEW7hEI6Qe5sicmZ92L5Y2221111.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.89 | ||
# https://tv.cctv.com/2022/11/05/VIDEzofTInbAFVWFtdEo914e221105.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.93 | ||
# https://tv.cctv.com/2022/10/28/VIDEI8eGDGNuILT2tYT0XFl3221028.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.97 | ||
# https://tv.cctv.com/2022/10/21/VIDEpG9xRTu3Jd1OJuFhfRFW221021.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.101 | ||
# https://tv.cctv.com/2022/10/14/VIDE6byRmQK62VUd8uJm1d0n221014.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.105 | ||
|
||
## 环球视线 | ||
# https://tv.cctv.com/2021/02/02/VIDEoOaXogqDjYzaSlpuiUJb210202.shtml?spm=C45404.PYmcE9NtJ5Mr.EbXlq1ofpYTu.44 | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-0.3.6 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
import os | ||
import argparse | ||
from modelscope.pipelines import pipeline | ||
from modelscope.utils.constant import Tasks | ||
|
||
ans = pipeline( | ||
Tasks.acoustic_noise_suppression, | ||
model='./ckpt/damo/speech_frcrn_ans_cirm_16k') | ||
|
||
|
||
def main(pth, save_pth): | ||
files = os.listdir(pth) | ||
os.makedirs(save_pth, exist_ok=True) | ||
for f in files: | ||
if '.wav' in f: | ||
result = ans( | ||
os.path.join(pth, f), | ||
output_path=os.path.join(save_pth, 'clear_' + f.split('.')[0] + '.wav')) | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--pth', default='../LuckyData/bilibili_vad') | ||
parser.add_argument('--save_pth', default='../LuckyData/bilibili_vad_clear') | ||
a = parser.parse_args() | ||
main(a.pth, a.save_pth) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
import os | ||
import torch | ||
from pprint import pprint | ||
import numpy as np | ||
import sys | ||
from scipy.io import wavfile | ||
import argparse | ||
from tqdm import tqdm | ||
|
||
sys.path.append('./dataprocessing/vad') | ||
from silerovad.utils_vad import * | ||
|
||
SAMPLING_RATE = 16000 | ||
torch.set_num_threads(1) | ||
|
||
model, utils = torch.hub.load(repo_or_dir='./dataprocessing/vad/silerovad', | ||
source='local', | ||
model='silero_vad', | ||
force_reload=True, | ||
onnx=False) | ||
|
||
(get_speech_timestamps, | ||
save_audio, | ||
read_audio, | ||
VADIterator, | ||
collect_chunks) = utils | ||
|
||
|
||
def main(pth, save_pth): | ||
os.makedirs(save_pth, exist_ok=True) | ||
names = os.listdir(pth) | ||
for name in tqdm(names): | ||
if '.wav' in name: | ||
wav = read_audio(os.path.join(pth, name), sampling_rate=SAMPLING_RATE) | ||
wav_np = wav.numpy() | ||
# get speech timestamps from full audio file | ||
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE, threshold=0.8) | ||
# 保存切分的片段 | ||
i = 1 | ||
for d in speech_timestamps: | ||
start = d['start'] | ||
end = d['end'] | ||
if (end - start)/SAMPLING_RATE < 3.0: | ||
continue | ||
wav_np_temp = wav_np[start:end] * 32767.0 | ||
wavfile.write( | ||
os.path.join(save_pth, name.split('.')[0] + '_cut_' + str(i) + '.wav'), | ||
SAMPLING_RATE, | ||
wav_np_temp.astype(np.int16), | ||
) | ||
i += 1 | ||
|
||
|
||
if __name__ == '__main__': | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument('--pth', default='/home/admin/yuanxin/LuckyData/bilibili') | ||
parser.add_argument('--save_pth', default='/home/admin/yuanxin/LuckyData/bilibili_vad') | ||
a = parser.parse_args() | ||
main(a.pth, a.save_pth) | ||
|
52 changes: 52 additions & 0 deletions
52
dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/bug_report.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
--- | ||
name: Bug report | ||
about: Create a report to help us improve | ||
title: Bug report - [X] | ||
labels: bug | ||
assignees: snakers4 | ||
|
||
--- | ||
|
||
## 🐛 Bug | ||
|
||
<!-- A clear and concise description of what the bug is. --> | ||
|
||
## To Reproduce | ||
|
||
Steps to reproduce the behavior: | ||
|
||
1. | ||
2. | ||
3. | ||
|
||
<!-- If you have a code sample, error messages, stack traces, please provide it here as well --> | ||
|
||
## Expected behavior | ||
|
||
<!-- A clear and concise description of what you expected to happen. --> | ||
|
||
## Environment | ||
|
||
Please copy and paste the output from this | ||
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py) | ||
(or fill out the checklist below manually). | ||
|
||
You can get the script and run it with: | ||
``` | ||
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py | ||
# For security purposes, please check the contents of collect_env.py before running it. | ||
python collect_env.py | ||
``` | ||
|
||
- PyTorch Version (e.g., 1.0): | ||
- OS (e.g., Linux): | ||
- How you installed PyTorch (`conda`, `pip`, source): | ||
- Build command you used (if compiling from source): | ||
- Python version: | ||
- CUDA/cuDNN version: | ||
- GPU models and configuration: | ||
- Any other relevant information: | ||
|
||
## Additional context | ||
|
||
<!-- Add any other context about the problem here. --> |
27 changes: 27 additions & 0 deletions
27
dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/feature_request.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
name: Feature request | ||
about: Suggest an idea for this project | ||
title: Feature request - [X] | ||
labels: enhancement | ||
assignees: snakers4 | ||
|
||
--- | ||
|
||
## 🚀 Feature | ||
<!-- A clear and concise description of the feature proposal --> | ||
|
||
## Motivation | ||
|
||
<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too --> | ||
|
||
## Pitch | ||
|
||
<!-- A clear and concise description of what you want to happen. --> | ||
|
||
## Alternatives | ||
|
||
<!-- A clear and concise description of any alternative solutions or features you've considered, if any. --> | ||
|
||
## Additional context | ||
|
||
<!-- Add any other context or screenshots about the feature request here. --> |
12 changes: 12 additions & 0 deletions
12
dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/questions---help---support.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
name: Questions / Help / Support | ||
about: Ask for help, support or ask a question | ||
title: "❓ Questions / Help / Support" | ||
labels: help wanted | ||
assignees: snakers4 | ||
|
||
--- | ||
|
||
## ❓ Questions and Help | ||
|
||
We have a [wiki](https://github.com/snakers4/silero-models/wiki) available for our users. Please make sure you have checked it out first. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Contributor Covenant Code of Conduct | ||
|
||
## Our Pledge | ||
|
||
In the interest of fostering an open and welcoming environment, we as | ||
contributors and maintainers pledge to making participation in our project and | ||
our community a harassment-free experience for everyone, regardless of age, body | ||
size, disability, ethnicity, sex characteristics, gender identity and expression, | ||
level of experience, education, socio-economic status, nationality, personal | ||
appearance, race, religion, or sexual identity and orientation. | ||
|
||
## Our Standards | ||
|
||
Examples of behavior that contributes to creating a positive environment | ||
include: | ||
|
||
* Using welcoming and inclusive language | ||
* Being respectful of differing viewpoints and experiences | ||
* Gracefully accepting constructive criticism | ||
* Focusing on what is best for the community | ||
* Showing empathy towards other community members | ||
|
||
Examples of unacceptable behavior by participants include: | ||
|
||
* The use of sexualized language or imagery and unwelcome sexual attention or | ||
advances | ||
* Trolling, insulting/derogatory comments, and personal or political attacks | ||
* Public or private harassment | ||
* Publishing others' private information, such as a physical or electronic | ||
address, without explicit permission | ||
* Other conduct which could reasonably be considered inappropriate in a | ||
professional setting | ||
|
||
## Our Responsibilities | ||
|
||
Project maintainers are responsible for clarifying the standards of acceptable | ||
behavior and are expected to take appropriate and fair corrective action in | ||
response to any instances of unacceptable behavior. | ||
|
||
Project maintainers have the right and responsibility to remove, edit, or | ||
reject comments, commits, code, wiki edits, issues, and other contributions | ||
that are not aligned to this Code of Conduct, or to ban temporarily or | ||
permanently any contributor for other behaviors that they deem inappropriate, | ||
threatening, offensive, or harmful. | ||
|
||
## Scope | ||
|
||
This Code of Conduct applies both within project spaces and in public spaces | ||
when an individual is representing the project or its community. Examples of | ||
representing a project or community include using an official project e-mail | ||
address, posting via an official social media account, or acting as an appointed | ||
representative at an online or offline event. Representation of a project may be | ||
further defined and clarified by project maintainers. | ||
|
||
## Enforcement | ||
|
||
Instances of abusive, harassing, or otherwise unacceptable behavior may be | ||
reported by contacting the project team at [email protected]. All | ||
complaints will be reviewed and investigated and will result in a response that | ||
is deemed necessary and appropriate to the circumstances. The project team is | ||
obligated to maintain confidentiality with regard to the reporter of an incident. | ||
Further details of specific enforcement policies may be posted separately. | ||
|
||
Project maintainers who do not follow or enforce the Code of Conduct in good | ||
faith may face temporary or permanent repercussions as determined by other | ||
members of the project's leadership. | ||
|
||
## Attribution | ||
|
||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, | ||
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html | ||
|
||
[homepage]: https://www.contributor-covenant.org | ||
|
||
For answers to common questions about this code of conduct, see | ||
https://www.contributor-covenant.org/faq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2020-present Silero Team | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
Oops, something went wrong.