mdxc_separator.py: RuntimeError #96

Oefuli · 2024-08-09T10:37:47Z

I used your code from the documentation: https://github.com/nomadkaraoke/python-audio-separator/blob/main/README.md#as-a-dependency-in-a-python-project

from audio_separator.separator import Separator

# Initialize the Separator class (with optional configuration properties, below)
separator = Separator()

# Load a machine learning model (if unspecified, defaults to 'model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt')
separator.load_model()

# Perform the separation on specific audio files without reloading the model
output_files = separator.separate('audio1.wav')

print(f"Separation complete! Output file(s): {' '.join(output_files)}")

I get the following error:

File [~/miniconda3/envs/conenv_speech_outlier/lib/python3.12/site-packages/audio_separator/separator/architectures/mdxc_separator.py:188](http://localhost:8888/lab/tree/~/miniconda3/envs/conenv_speech_outlier/lib/python3.12/site-packages/audio_separator/separator/architectures/mdxc_separator.py#line=187), in MDXCSeparator.overlap_add(self, result, x, weights, start, length)
    186 x = x.to(result.device)
    187 weights = weights.to(result.device)
--> 188 result[..., start : start + length] += x[..., :length] * weights[:length]
    189 return result

RuntimeError: The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1

I took a look at the dimensions:

result[..., start : start + length]:  torch.Size([2, 2, 114660])
x[..., :length]:  torch.Size([2, 238140])
weights[:length]:  torch.Size([352800])

The text was updated successfully, but these errors were encountered:

beveradb · 2024-08-13T14:28:36Z

Hmm that's strange - but I've seen this before with unusual input files before (e.g. different sample rate or particularly large files)

Can you try with another model, and with another input audio file?

Oefuli · 2024-08-16T10:59:58Z

Thank you very much: It works with other models (I used the same input file).

Here are the meta infos about my file:
'_duration_str'): '5.400 s', 'channels'): 1, 'duration'): 5.4, 'endian'): 'FILE', 'format'): 'WAV', 'format_info'): 'WAV (Microsoft)', 'frames'): 119070, 'samplerate'): 22050, 'sections'): 1, 'subtype'): 'PCM_16', 'subtype_info'): 'Signed 16 bit PCM', 'size_bytes'): 238218}

I tried all the models from

Separator().list_supported_model_files()

together with my audio file.
The following models did not worked:

  - '**17_HP-Wind_Inst-UVR.pth**'
          Error: Audio buffer is not finite everywhere
  - '**model_bs_roformer_ep_317_sdr_12.9755.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 114660])
                    x[..., :length]:  torch.Size([2, 238140])
                    weights[:length]:  torch.Size([352800])
                   The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1
  - '**model_bs_roformer_ep_368_sdr_12.9628.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 114660])
                   x[..., :length]:  torch.Size([2, 238140])
                   weights[:length]:  torch.Size([352800])
                  The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1
  - '**model_bs_roformer_ep_937_sdr_10.5309.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 23492])
                    x[..., :length]:  torch.Size([2, 238080])
                   weights[:length]:  torch.Size([261632])
                  The size of tensor a (238080) must match the size of tensor b (261632) at non-singleton dimension 1
  - '**model_mel_band_roformer_ep_3005_sdr_11.4360.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 114660])
                    x[..., :length]:  torch.Size([2, 238140])
                   weights[:length]:  torch.Size([352800])
                  The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1

For all these models I get:

Unsupported Model File: parameters for MD5 hash XYZ could not be found in UVR model data file for MDX or VR arch.

(where XYZ is the hash for the respective file)

   'f7e0c4bc-ba3fe64a.th',
 'd12395a8-e57c48e6.th',
 '92cfc3b6-ef3bcb9c.th',
 '04573f0d-f3cf25b2.th',
 '955717e8-8726e21a.th',
 '75fc33f5-1941ce65.th',
 '5c90dfd2-34c22ccb.th'

My aim is to separate speech from various background noises (music, noise, dogs barking, chattering, etc.). Which models have you had the best experience with?

Many thanks and best regards

beveradb · 2024-08-16T21:48:42Z

Gotcha; sounds like the implementation of the RoFormer architectures don't work with your input file for some reason.

Here's my recommendations for the models I personally use:
#82 (comment)

CHFR-wide · 2024-08-26T19:21:05Z

The default model setting seems to give me that issue on audio files that are too short, after running a bit of ffmpeg to pad the audio files to at least 10 seconds, I was able to get it working.

Is your issue perhaps of the same nature?

JackismyShephard · 2024-08-27T20:25:31Z

The default model setting seems to give me that issue on audio files that are too short, after running a bit of ffmpeg to pad the audio files to at least 10 seconds, I was able to get it working.

Is your issue perhaps of the same nature?

I have observed similar problems with MDXNet as well.

Lixi20 · 2024-08-29T07:46:59Z

I also encountered the same problem: Audio buffer is not finite everywhere.
The model used is: UVR-DeEcho-DeReverb.pth,
and the parameters are as follows:
de_reverb_separator = Separator( model_file_dir="/home/geek/.cache/audio-separator-models", output_dir="/home/geek/download/audio", output_format="wav", normalization_threshold=0.9, sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } ) sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } )

Lixi20 · 2024-08-29T08:09:58Z

I also encountered the same problem: Audio buffer is not finite everywhere. The model used is: UVR-DeEcho-DeReverb.pth, and the parameters are as follows: de_reverb_separator = Separator( model_file_dir="/home/geek/.cache/audio-separator-models", output_dir="/home/geek/download/audio", output_format="wav", normalization_threshold=0.9, sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } ) sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } )

I saw fix "Audio buffer is not finite everywhere" in ultimatevocalremovergui, I don't know if it helps youhere @beveradb

Lixi20 · 2024-08-29T08:16:38Z

This is a substitute for personal reform, hope for you! ! !

@beveradb

beveradb · 2024-09-15T18:50:32Z

Thank you so much @Lixi20 - this fix is now live in audio-separator version 0.19.2 🎉

Lixi20 mentioned this issue Aug 29, 2024

Why is there no sound sometimes when using the 5_HP-Karaoke-UVR.pth model? #98

Closed

beveradb closed this as completed Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mdxc_separator.py: RuntimeError #96

mdxc_separator.py: RuntimeError #96

Oefuli commented Aug 9, 2024 •

edited

Loading

beveradb commented Aug 13, 2024

Oefuli commented Aug 16, 2024 •

edited

Loading

beveradb commented Aug 16, 2024

CHFR-wide commented Aug 26, 2024 •

edited

Loading

JackismyShephard commented Aug 27, 2024

Lixi20 commented Aug 29, 2024 •

edited

Loading

Lixi20 commented Aug 29, 2024

Lixi20 commented Aug 29, 2024

beveradb commented Sep 15, 2024

mdxc_separator.py: RuntimeError #96

mdxc_separator.py: RuntimeError #96

Comments

Oefuli commented Aug 9, 2024 • edited Loading

beveradb commented Aug 13, 2024

Oefuli commented Aug 16, 2024 • edited Loading

beveradb commented Aug 16, 2024

CHFR-wide commented Aug 26, 2024 • edited Loading

JackismyShephard commented Aug 27, 2024

Lixi20 commented Aug 29, 2024 • edited Loading

Lixi20 commented Aug 29, 2024

Lixi20 commented Aug 29, 2024

beveradb commented Sep 15, 2024

Oefuli commented Aug 9, 2024 •

edited

Loading

Oefuli commented Aug 16, 2024 •

edited

Loading

CHFR-wide commented Aug 26, 2024 •

edited

Loading

Lixi20 commented Aug 29, 2024 •

edited

Loading