Skip to content

Commit

Permalink
change sentence splitter
Browse files Browse the repository at this point in the history
  • Loading branch information
numz committed Aug 23, 2023
1 parent eb52ca0 commit 896011f
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 3 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
## 🔗 Requirements

- latest version of Stable Diffusion WebUI Automatic1111 by following the instructions on the [Stable Diffusion Webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) repository.
- FFmpeg : download it from the [official FFmpeg site](https://ffmpeg.org/download.html). Follow the instructions appropriate for your operating system, note ffmpeg have to be accessible from the command line.

## 💻 Installation

Expand Down Expand Up @@ -95,6 +96,9 @@ It's an all-in-one solution: just choose a video and a speech file (wav or mp3),
3. Choose your speaker, you can ear a sample in the "Audio Example"
4. Choose Low VRAM True (default) if you have a Video Card with less than 16GB VRAM
5. Write your text in the text area "Prompt"
- **Note** that bark can only generate 14 seconds of audio, so if you want to generate a longer audio, you have to use "[split]" in your text.
- For example, if you want to generate a 30 seconds audio, you have to write your text like this :
- "This is the first part of my text **[split]** This is the second part of my text"
6. Temperature: 0.0 is supposed to be closer to the voice, and 1.0 is more creative, but in reality, 0.0 yields strange results and 1.0 something very far from the voice. 0.7 is the default value set by 'bark', try different values to see what works best for you.
7. Silence : Time in seconds between each punctuation(。!!.??,). Default is 0.25 seconds.
8. See Bark [documentation](https://github.com/suno-ai/bark/) for more details.
Expand Down
2 changes: 1 addition & 1 deletion scripts/bark/tts.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def generate(self):
force_reload=False)
pieces = []
# split text_prompt into sentences by punctuation
sentences = re.split('。|!|\!|\.|?|\?|,', self.text_prompt)
sentences = re.split('\[split\]', self.text_prompt)
silence = np.zeros(int(self.silence * SAMPLE_RATE)).astype(np.float32)
for sentence in sentences:
if sentence.strip() != "":
Expand Down
4 changes: 2 additions & 2 deletions scripts/ui.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,10 @@ def select_speaker(speaker):
audio_example = gr.Audio(label="Audio example",
value="https://dl.suno-models.io/bark/prompts/prompt_audio/en_speaker_0.mp3")
with gr.Column():
suno_prompt = gr.Textbox(label="Prompt", placeholder="Prompt", lines=5, type="text")
suno_prompt = gr.Textbox(label="Prompt", placeholder="Prompt", lines=5, type="text",info="Don't forget that bark can only generate 14 seconds of audio at a time, so for long text, you need to use [split] to split the text into multiple prompts")
temperature = gr.Slider(label="Generation temperature", minimum=0.01, maximum=1, step=0.01, value=0.7,
info="1.0 more diverse, 0.0 more conservative")
silence = gr.Slider(label="Silence", minimum=0, maximum=1, step=0.01, value=0.25, info="Silence after ponctuation(。!!.??,) in seconde")
silence = gr.Slider(label="Silence", minimum=0, maximum=1, step=0.01, value=0.25, info="Silence after [split] in seconde")
generate_audio = gr.Button("Generate")
audio = gr.Audio(label="Speech", type="filepath")

Expand Down

0 comments on commit 896011f

Please sign in to comment.