Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Text duplication while audio generation #72

Closed
olehsamoilenko opened this issue Aug 16, 2024 · 1 comment
Closed

[Bug] Text duplication while audio generation #72

olehsamoilenko opened this issue Aug 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@olehsamoilenko
Copy link

olehsamoilenko commented Aug 16, 2024

Describe the bug

Sometimes redundant duplicated text is generated. I use default model and config (no fine-tuning). Occurrence rate is not 100%, it happens sometimes (that is why I use a loop in my code example below). In my example words "is inspired by the dishes" are generated several times, check the audio: https://drive.google.com/file/d/1geLlH2im1bCLMpQcQV7QgRWU0c57eG4y/view

May it relate to the fact that word "menu" occurs 2 times in my text? Text is pretty long, but < 250 characters so should be acceptable. Also may be related to the issue discussed here: coqui-ai#3516 and potential fix here: coqui-ai#3516 (comment). Is it a bug or I use the library wrong?

CC: @eginhard @bensonbs

text = "on the menu that Sam our chef here has put together, Okay this is one of our best sellers isn't it Sam, Yes it is, So this is our scampi, So I grew up in a pub and a lot of the things on the menu is inspired by the dishes from"
print(len(text))
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")

for i in range(10):
    tts.tts_to_file(text=text,
                    file_path=f"test_{i}.wav",
                    speaker_wav="./tests/data/ljspeech/wavs/LJ001-0001.wav",
                    language='en',
                    split_sentences=False)

To Reproduce

Run the code from description. Some of generated files may contain text duplication.

Expected behavior

Redundant text is not generated.

Logs

226
/Users/olehsamoilenko/coqui-ai-TTS/TTS/tts/layers/xtts/xtts_manager.py:6: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.speakers = torch.load(speaker_file_path)
/opt/anaconda3/envs/coqui/lib/python3.9/site-packages/trainer/io.py:83: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(f, map_location=map_location, **kwargs)

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.4.0",
        "TTS": "0.24.1",
        "numpy": "1.26.4"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "arm",
        "python": "3.9.19",
        "version": "Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6030"
    }
}

Additional context

No response

@olehsamoilenko olehsamoilenko added the bug Something isn't working label Aug 16, 2024
@eginhard
Copy link
Member

This is not a bug, it's just due to how the XTTS model works and not possible to avoid completely. You could try to shorten your input by splitting the sentences.

@eginhard eginhard closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants