German Vits model finetuning -> output has increased speed and way higher pitch #1643
-
I fine tuned the @thorstenMueller German Vits model, recently released. I used 1 hours of 22050hz sampled 16bits pcm, mono audio, female German voice. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 3 replies
-
Maybe your audio files have wrong header information and/or are encoded in a wrong way. |
Beta Was this translation helpful? Give feedback.
-
Input File : 'cf8c075c309055e83f7b3632cb0fda46.wav' My bad, the sample rate was indeed 16kHz, but dunno if this could at any point impact the pitch(?), I would rather say not. Correct me please if I'm wrong.The audio sounds correct once played in the VLC. |
Beta Was this translation helpful? Give feedback.
-
Your 16K-sample is interpreted as 22K, this is a speed increase of approx 1.37x, the pitch changes by roughly 3.5 semitones - this is notably |
Beta Was this translation helpful? Give feedback.
-
I forgot to say thank you to @domcross and @thorstenMueller for preparing this pre-trained model. The quality of the pre-trained model is really amazing. Good job guys 🥇 The suggested by @domcross solution solved the problem with my fine tuning. The fine-tuned model is also very good and fast now (RTF:0.3). |
Beta Was this translation helpful? Give feedback.
Your 16K-sample is interpreted as 22K, this is a speed increase of approx 1.37x, the pitch changes by roughly 3.5 semitones - this is notably