Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
xin.yuan committed Jan 17, 2023
0 parents commit 2655b8a
Show file tree
Hide file tree
Showing 66 changed files with 268,795 additions and 0 deletions.
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# LuckyVoice
Inspired by the host competition in 2019, this repository tries to use Zou Yun's voice to build a high-expressive speech synthesis system.
The pinyin of 邹韵 is Zōu yùn, which is a homonym for good luck.

[HuggingFace🤗 Demo-Baker](https://huggingface.co/spaces/yuan1615/EmpathyTTS) | [HuggingFace🤗 Demo-Lucky | WIP](https://huggingface.co/spaces/EmpathyTTS)


## 1. Data Collection and Processing
### 1.1 Collect related videos of Zou Yun
```
1. Use the 'you-get' tool to download videos in batches, and the video address is in dataprocessing/collectvideos/main.py.
2. Use a format converter to convert video to wav files.
```
### 1.2 Split the audio using the [vad](https://github.com/snakers4/silero-vad) method.
```
python dataprocessing/vad/main.py --pth [downloaded video] --savepth [Save address of split audio]
```

### 1.3 Noise reduction using [speech enhancement model](https://www.modelscope.cn/models/damo/speech_frcrn_ans_cirm_16k/summary).
[pre-trained model](https://drive.google.com/file/d/1T0fm9GA_0PIg8QOchpnHcdG9Kvp_X0ZN/view?usp=sharing)
```
sudo docker build -t se .
sudo docker run -it --rm -v /home/admin/yuanxin:/se se
python dataprocessing/se/main.py
```
### 1.4 Classify audio using a [voiceprint recognition model](https://github.com/wenet-e2e/wespeaker).


### 1.5 Processing text with a [speech recognition](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) and [speech synthesis front-end](https://www.modelscope.cn/models/damo/speech_sambert-hifigan_tts_zhitian_emo_zh-cn_16k/summary)
[speech synthesis front-end](https://drive.google.com/file/d/1jAfnclbgAkUXXKjWgBic2dmdPJECQgzm/view?usp=sharing)

## 2. Baseline Model

### 2.1 [VITS](https://github.com/jaywalnut310/vits) model with prosodic representation
[pretrained_baker.pth](https://drive.google.com/file/d/13IJf70A5UjvTfJBMowVGjXLTpERaYZnV/view?usp=sharing)
```
python model/vits/main.py --text ['你好'] --out [The address to save the file]
```

### 2.2 [DiffSpeech](https://github.com/MoonInTheRiver/DiffSinger) model with prosodic representation


## 3. EmpathyTTS

26 changes: 26 additions & 0 deletions dataprocessing/collectvideos/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# 利用 you-get 批量下载视频
# pip install you-get

## B站视频-来自主持人大赛
# you-get -o ~/Videos 'https://www.bilibili.com/video/BV16J411n7pr?p=1&vd_source=27036885f03e58efaf94bcc2b83eee66' --playlist

## 央视频-来自高端访谈(英文语料)
# https://tv.cctv.com/2023/01/06/VIDETvjqiim79x1j9qHkkYDy230106.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.21
# https://tv.cctv.com/2022/12/23/VIDEU8qM9KeOUBKy5cteqatf221223.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.25
# https://tv.cctv.com/2022/12/16/VIDEqyrLr7GrwnJ14KOT5WPz221216.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.73
# https://tv.cctv.com/2022/12/09/VIDEE0uG4MQm0GNSSd48AhnT221209.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.76
# https://tv.cctv.com/2022/11/25/VIDEsxUkOGDlKWm9UOXoJiec221125.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.81
# https://tv.cctv.com/2022/11/18/VIDEDW0aouwBbOEj6K7r9e2Q221118.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.85
# https://tv.cctv.com/2022/11/11/VIDEW7hEI6Qe5sicmZ92L5Y2221111.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.89
# https://tv.cctv.com/2022/11/05/VIDEzofTInbAFVWFtdEo914e221105.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.93
# https://tv.cctv.com/2022/10/28/VIDEI8eGDGNuILT2tYT0XFl3221028.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.97
# https://tv.cctv.com/2022/10/21/VIDEpG9xRTu3Jd1OJuFhfRFW221021.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.101
# https://tv.cctv.com/2022/10/14/VIDE6byRmQK62VUd8uJm1d0n221014.shtml?spm=C45404.PLVzYAdZDVTK.E65woHuG1VKZ.105

## 环球视线
# https://tv.cctv.com/2021/02/02/VIDEoOaXogqDjYzaSlpuiUJb210202.shtml?spm=C45404.PYmcE9NtJ5Mr.EbXlq1ofpYTu.44





1 change: 1 addition & 0 deletions dataprocessing/se/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-py37-torch1.11.0-tf1.15.5-0.3.6
27 changes: 27 additions & 0 deletions dataprocessing/se/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import os
import argparse
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

ans = pipeline(
Tasks.acoustic_noise_suppression,
model='./ckpt/damo/speech_frcrn_ans_cirm_16k')


def main(pth, save_pth):
files = os.listdir(pth)
os.makedirs(save_pth, exist_ok=True)
for f in files:
if '.wav' in f:
result = ans(
os.path.join(pth, f),
output_path=os.path.join(save_pth, 'clear_' + f.split('.')[0] + '.wav'))


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--pth', default='../LuckyData/bilibili_vad')
parser.add_argument('--save_pth', default='../LuckyData/bilibili_vad_clear')
a = parser.parse_args()
main(a.pth, a.save_pth)

60 changes: 60 additions & 0 deletions dataprocessing/vad/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import os
import torch
from pprint import pprint
import numpy as np
import sys
from scipy.io import wavfile
import argparse
from tqdm import tqdm

sys.path.append('./dataprocessing/vad')
from silerovad.utils_vad import *

SAMPLING_RATE = 16000
torch.set_num_threads(1)

model, utils = torch.hub.load(repo_or_dir='./dataprocessing/vad/silerovad',
source='local',
model='silero_vad',
force_reload=True,
onnx=False)

(get_speech_timestamps,
save_audio,
read_audio,
VADIterator,
collect_chunks) = utils


def main(pth, save_pth):
os.makedirs(save_pth, exist_ok=True)
names = os.listdir(pth)
for name in tqdm(names):
if '.wav' in name:
wav = read_audio(os.path.join(pth, name), sampling_rate=SAMPLING_RATE)
wav_np = wav.numpy()
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE, threshold=0.8)
# 保存切分的片段
i = 1
for d in speech_timestamps:
start = d['start']
end = d['end']
if (end - start)/SAMPLING_RATE < 3.0:
continue
wav_np_temp = wav_np[start:end] * 32767.0
wavfile.write(
os.path.join(save_pth, name.split('.')[0] + '_cut_' + str(i) + '.wav'),
SAMPLING_RATE,
wav_np_temp.astype(np.int16),
)
i += 1


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--pth', default='/home/admin/yuanxin/LuckyData/bilibili')
parser.add_argument('--save_pth', default='/home/admin/yuanxin/LuckyData/bilibili_vad')
a = parser.parse_args()
main(a.pth, a.save_pth)

52 changes: 52 additions & 0 deletions dataprocessing/vad/silerovad/.github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
name: Bug report
about: Create a report to help us improve
title: Bug report - [X]
labels: bug
assignees: snakers4

---

## 🐛 Bug

<!-- A clear and concise description of what the bug is. -->

## To Reproduce

Steps to reproduce the behavior:

1.
2.
3.

<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

## Expected behavior

<!-- A clear and concise description of what you expected to happen. -->

## Environment

Please copy and paste the output from this
[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
(or fill out the checklist below manually).

You can get the script and run it with:
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```

- PyTorch Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (`conda`, `pip`, source):
- Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:

## Additional context

<!-- Add any other context about the problem here. -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
name: Feature request
about: Suggest an idea for this project
title: Feature request - [X]
labels: enhancement
assignees: snakers4

---

## 🚀 Feature
<!-- A clear and concise description of the feature proposal -->

## Motivation

<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->

## Pitch

<!-- A clear and concise description of what you want to happen. -->

## Alternatives

<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

## Additional context

<!-- Add any other context or screenshots about the feature request here. -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
name: Questions / Help / Support
about: Ask for help, support or ask a question
title: "❓ Questions / Help / Support"
labels: help wanted
assignees: snakers4

---

## ❓ Questions and Help

We have a [wiki](https://github.com/snakers4/silero-models/wiki) available for our users. Please make sure you have checked it out first.
76 changes: 76 additions & 0 deletions dataprocessing/vad/silerovad/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at [email protected]. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq
21 changes: 21 additions & 0 deletions dataprocessing/vad/silerovad/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2020-present Silero Team

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Loading

0 comments on commit 2655b8a

Please sign in to comment.