LuckyVoice

Inspired by the host competition in 2019, this repository tries to use Zou Yun's voice to build a high-expressive speech synthesis system. The pinyin of 邹韵 is Zōu yùn, which is a homonym for good luck.

HuggingFace🤗 Demo-Baker | HuggingFace🤗 Demo-Lucky | WIP

1. Data Collection and Processing

1.1 Collect related videos of Zou Yun

1. Use the 'you-get' tool to download videos in batches, and the video address is in dataprocessing/collectvideos/main.py.
2. Use a format converter to convert video to wav files.

1.2 Split the audio using the vad method.

python dataprocessing/vad/main.py --pth [downloaded video] --savepth [Save address of split audio]

1.3 Noise reduction using speech enhancement model.

pre-trained model

sudo docker build -t se .
sudo docker run -it --rm -v /home/admin/yuanxin:/se se
python dataprocessing/se/main.py

1.4 Classify audio using a voiceprint recognition model.

1.5 Processing text with a speech recognition and speech synthesis front-end

speech synthesis front-end

2. Baseline Model

2.1 VITS model with prosodic representation

pretrained_baker.pth

python model/vits/main.py --text ['你好'] --out [The address to save the file]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LuckyVoice

1. Data Collection and Processing

1.1 Collect related videos of Zou Yun

1.2 Split the audio using the vad method.

1.3 Noise reduction using speech enhancement model.

1.4 Classify audio using a voiceprint recognition model.

1.5 Processing text with a speech recognition and speech synthesis front-end

2. Baseline Model

2.1 VITS model with prosodic representation

2.2 DiffSpeech model with prosodic representation

3. EmpathyTTS

Files

README.md

Latest commit

History

README.md

File metadata and controls

LuckyVoice

1. Data Collection and Processing

1.1 Collect related videos of Zou Yun

1.2 Split the audio using the vad method.

1.3 Noise reduction using speech enhancement model.

1.4 Classify audio using a voiceprint recognition model.

1.5 Processing text with a speech recognition and speech synthesis front-end

2. Baseline Model

2.1 VITS model with prosodic representation

2.2 DiffSpeech model with prosodic representation

3. EmpathyTTS