change sentence splitter

numz · Aug 23, 2023 · 896011f · 896011f
1 parent eb52ca0
commit 896011f
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -59,6 +59,7 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
 ## 🔗 Requirements
 
 - latest version of Stable Diffusion WebUI Automatic1111 by following the instructions on the [Stable Diffusion Webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) repository.
+- FFmpeg : download it from the [official FFmpeg site](https://ffmpeg.org/download.html). Follow the instructions appropriate for your operating system, note ffmpeg have to be accessible from the command line.
 
 ## 💻 Installation
 
@@ -95,6 +96,9 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
       3. Choose your speaker, you can ear a sample in the "Audio Example"
       4. Choose Low VRAM True (default) if you have a Video Card with less than 16GB VRAM 
       5. Write your text in the text area "Prompt"
+         - **Note** that bark can only generate 14 seconds of audio, so if you want to generate a longer audio, you have to use  "[split]" in your text.  
+         - For example, if you want to generate a 30 seconds audio, you have to write your text like this :
+           - "This is the first part of my text **[split]** This is the second part of my text"
       6. Temperature: 0.0 is supposed to be closer to the voice, and 1.0 is more creative, but in reality, 0.0 yields strange results and 1.0 something very far from the voice. 0.7 is the default value set by 'bark', try different values to see what works best for you.
       7. Silence : Time in seconds between each punctuation(。！!.？?,). Default is 0.25 seconds.
       8. See Bark [documentation](https://github.com/suno-ai/bark/) for more details.

diff --git a/scripts/bark/tts.py b/scripts/bark/tts.py
@@ -39,7 +39,7 @@ def generate(self):
                 force_reload=False)
         pieces = []
         # split text_prompt into sentences by punctuation
-        sentences = re.split('。|！|\!|\.|？|\?|,', self.text_prompt)
+        sentences = re.split('\[split\]', self.text_prompt)
         silence = np.zeros(int(self.silence * SAMPLE_RATE)).astype(np.float32)
         for sentence in sentences:
             if sentence.strip() != "":

diff --git a/scripts/ui.py b/scripts/ui.py
@@ -68,10 +68,10 @@ def select_speaker(speaker):
                             audio_example = gr.Audio(label="Audio example",
                                                      value="https://dl.suno-models.io/bark/prompts/prompt_audio/en_speaker_0.mp3")
                         with gr.Column():
-                            suno_prompt = gr.Textbox(label="Prompt", placeholder="Prompt", lines=5, type="text")
+                            suno_prompt = gr.Textbox(label="Prompt", placeholder="Prompt", lines=5, type="text",info="Don't forget that bark can only generate 14 seconds of audio at a time, so for long text, you need to use [split] to split the text into multiple prompts")
                             temperature = gr.Slider(label="Generation temperature", minimum=0.01, maximum=1, step=0.01, value=0.7,
                                                   info="1.0 more diverse, 0.0 more conservative")
-                            silence = gr.Slider(label="Silence", minimum=0, maximum=1, step=0.01, value=0.25, info="Silence after ponctuation(。！!.？?,) in seconde")
+                            silence = gr.Slider(label="Silence", minimum=0, maximum=1, step=0.01, value=0.25, info="Silence after [split] in seconde")
                             generate_audio = gr.Button("Generate")
                             audio = gr.Audio(label="Speech", type="filepath")