[Bug] Text duplication while audio generation #72

olehsamoilenko · 2024-08-16T13:20:42Z

Describe the bug

Sometimes redundant duplicated text is generated. I use default model and config (no fine-tuning). Occurrence rate is not 100%, it happens sometimes (that is why I use a loop in my code example below). In my example words "is inspired by the dishes" are generated several times, check the audio: https://drive.google.com/file/d/1geLlH2im1bCLMpQcQV7QgRWU0c57eG4y/view

May it relate to the fact that word "menu" occurs 2 times in my text? Text is pretty long, but < 250 characters so should be acceptable. Also may be related to the issue discussed here: coqui-ai#3516 and potential fix here: coqui-ai#3516 (comment). Is it a bug or I use the library wrong?

CC: @eginhard @bensonbs

text = "on the menu that Sam our chef here has put together, Okay this is one of our best sellers isn't it Sam, Yes it is, So this is our scampi, So I grew up in a pub and a lot of the things on the menu is inspired by the dishes from"
print(len(text))
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")

for i in range(10):
    tts.tts_to_file(text=text,
                    file_path=f"test_{i}.wav",
                    speaker_wav="./tests/data/ljspeech/wavs/LJ001-0001.wav",
                    language='en',
                    split_sentences=False)

To Reproduce

Run the code from description. Some of generated files may contain text duplication.

Expected behavior

Redundant text is not generated.

Logs

226
/Users/olehsamoilenko/coqui-ai-TTS/TTS/tts/layers/xtts/xtts_manager.py:6: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.speakers = torch.load(speaker_file_path)
/opt/anaconda3/envs/coqui/lib/python3.9/site-packages/trainer/io.py:83: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(f, map_location=map_location, **kwargs)

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.4.0",
        "TTS": "0.24.1",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "arm",
        "python": "3.9.19",
        "version": "Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6030"
    }
}

Additional context

No response

eginhard · 2024-08-16T14:15:50Z

This is not a bug, it's just due to how the XTTS model works and not possible to avoid completely. You could try to shorten your input by splitting the sentences.

olehsamoilenko added the bug Something isn't working label Aug 16, 2024

eginhard closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Text duplication while audio generation #72

[Bug] Text duplication while audio generation #72

olehsamoilenko commented Aug 16, 2024 •

edited

Loading

eginhard commented Aug 16, 2024

[Bug] Text duplication while audio generation #72

[Bug] Text duplication while audio generation #72

Comments

olehsamoilenko commented Aug 16, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

eginhard commented Aug 16, 2024

olehsamoilenko commented Aug 16, 2024 •

edited

Loading