Inspired by the host competition in 2019, this repository tries to use Zou Yun's voice to build a high-expressive speech synthesis system. The pinyin of 邹韵 is Zōu yùn, which is a homonym for good luck.
HuggingFace🤗 Demo-Baker | HuggingFace🤗 Demo-Lucky | WIP
1. Use the 'you-get' tool to download videos in batches, and the video address is in dataprocessing/collectvideos/
2. Use a format converter to convert video to wav files.
1.2 Split the audio using the vad method.
python dataprocessing/vad/ --pth [downloaded video] --savepth [Save address of split audio]
1.3 Noise reduction using speech enhancement model.
sudo docker build -t se .
sudo docker run -it --rm -v /home/admin/yuanxin:/se se
python dataprocessing/se/
1.4 Classify audio using a voiceprint recognition model.
1.5 Processing text with a speech recognition and speech synthesis front-end
2.1 VITS model with prosodic representation
python model/vits/ --text ['你好'] --out [The address to save the file]