Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mdxc_separator.py: RuntimeError #96

Closed
Oefuli opened this issue Aug 9, 2024 · 9 comments
Closed

mdxc_separator.py: RuntimeError #96

Oefuli opened this issue Aug 9, 2024 · 9 comments

Comments

@Oefuli
Copy link

Oefuli commented Aug 9, 2024

I used your code from the documentation: https://github.com/nomadkaraoke/python-audio-separator/blob/main/README.md#as-a-dependency-in-a-python-project

from audio_separator.separator import Separator

# Initialize the Separator class (with optional configuration properties, below)
separator = Separator()

# Load a machine learning model (if unspecified, defaults to 'model_mel_band_roformer_ep_3005_sdr_11.4360.ckpt')
separator.load_model()

# Perform the separation on specific audio files without reloading the model
output_files = separator.separate('audio1.wav')

print(f"Separation complete! Output file(s): {' '.join(output_files)}")

I get the following error:

File [~/miniconda3/envs/conenv_speech_outlier/lib/python3.12/site-packages/audio_separator/separator/architectures/mdxc_separator.py:188](http://localhost:8888/lab/tree/~/miniconda3/envs/conenv_speech_outlier/lib/python3.12/site-packages/audio_separator/separator/architectures/mdxc_separator.py#line=187), in MDXCSeparator.overlap_add(self, result, x, weights, start, length)
    186 x = x.to(result.device)
    187 weights = weights.to(result.device)
--> 188 result[..., start : start + length] += x[..., :length] * weights[:length]
    189 return result

RuntimeError: The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1

I took a look at the dimensions:

result[..., start : start + length]:  torch.Size([2, 2, 114660])
x[..., :length]:  torch.Size([2, 238140])
weights[:length]:  torch.Size([352800])
@beveradb
Copy link
Collaborator

Hmm that's strange - but I've seen this before with unusual input files before (e.g. different sample rate or particularly large files)

Can you try with another model, and with another input audio file?

@Oefuli
Copy link
Author

Oefuli commented Aug 16, 2024

Thank you very much: It works with other models (I used the same input file).

Here are the meta infos about my file:
'_duration_str'): '5.400 s', 'channels'): 1, 'duration'): 5.4, 'endian'): 'FILE', 'format'): 'WAV', 'format_info'): 'WAV (Microsoft)', 'frames'): 119070, 'samplerate'): 22050, 'sections'): 1, 'subtype'): 'PCM_16', 'subtype_info'): 'Signed 16 bit PCM', 'size_bytes'): 238218}

I tried all the models from

Separator().list_supported_model_files()

together with my audio file.
The following models did not worked:

  - '**17_HP-Wind_Inst-UVR.pth**'
          Error: Audio buffer is not finite everywhere
  - '**model_bs_roformer_ep_317_sdr_12.9755.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 114660])
                    x[..., :length]:  torch.Size([2, 238140])
                    weights[:length]:  torch.Size([352800])
                   The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1
  - '**model_bs_roformer_ep_368_sdr_12.9628.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 114660])
                   x[..., :length]:  torch.Size([2, 238140])
                   weights[:length]:  torch.Size([352800])
                  The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1
  - '**model_bs_roformer_ep_937_sdr_10.5309.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 23492])
                    x[..., :length]:  torch.Size([2, 238080])
                   weights[:length]:  torch.Size([261632])
                  The size of tensor a (238080) must match the size of tensor b (261632) at non-singleton dimension 1
  - '**model_mel_band_roformer_ep_3005_sdr_11.4360.yaml**'
          Error: result[..., start : start + length]:  torch.Size([2, 2, 114660])
                    x[..., :length]:  torch.Size([2, 238140])
                   weights[:length]:  torch.Size([352800])
                  The size of tensor a (238140) must match the size of tensor b (352800) at non-singleton dimension 1

For all these models I get:

Unsupported Model File: parameters for MD5 hash XYZ could not be found in UVR model data file for MDX or VR arch.

(where XYZ is the hash for the respective file)

   'f7e0c4bc-ba3fe64a.th',
 'd12395a8-e57c48e6.th',
 '92cfc3b6-ef3bcb9c.th',
 '04573f0d-f3cf25b2.th',
 '955717e8-8726e21a.th',
 '75fc33f5-1941ce65.th',
 '5c90dfd2-34c22ccb.th'

My aim is to separate speech from various background noises (music, noise, dogs barking, chattering, etc.). Which models have you had the best experience with?

Many thanks and best regards

@beveradb
Copy link
Collaborator

Gotcha; sounds like the implementation of the RoFormer architectures don't work with your input file for some reason.

Here's my recommendations for the models I personally use:
#82 (comment)

@CHFR-wide
Copy link

CHFR-wide commented Aug 26, 2024

The default model setting seems to give me that issue on audio files that are too short, after running a bit of ffmpeg to pad the audio files to at least 10 seconds, I was able to get it working.

Is your issue perhaps of the same nature?

@JackismyShephard
Copy link

The default model setting seems to give me that issue on audio files that are too short, after running a bit of ffmpeg to pad the audio files to at least 10 seconds, I was able to get it working.

Is your issue perhaps of the same nature?

I have observed similar problems with MDXNet as well.

@Lixi20
Copy link
Contributor

Lixi20 commented Aug 29, 2024

I also encountered the same problem: Audio buffer is not finite everywhere.
The model used is: UVR-DeEcho-DeReverb.pth,
and the parameters are as follows:
de_reverb_separator = Separator( model_file_dir="/home/geek/.cache/audio-separator-models", output_dir="/home/geek/download/audio", output_format="wav", normalization_threshold=0.9, sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } ) sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } )

@Lixi20
Copy link
Contributor

Lixi20 commented Aug 29, 2024

I also encountered the same problem: Audio buffer is not finite everywhere. The model used is: UVR-DeEcho-DeReverb.pth, and the parameters are as follows: de_reverb_separator = Separator( model_file_dir="/home/geek/.cache/audio-separator-models", output_dir="/home/geek/download/audio", output_format="wav", normalization_threshold=0.9, sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } ) sample_rate=44100, vr_params={"batch_size": 16, "window_size": 512, "aggression": 5, "enable_tta": "True", "enable_post_process": "False", "post_process_threshold": 0.2, "high_end_process": "False" } )

I saw fix "Audio buffer is not finite everywhere" in ultimatevocalremovergui, I don't know if it helps youhere @beveradb

@Lixi20
Copy link
Contributor

Lixi20 commented Aug 29, 2024

This is a substitute for personal reform, hope for you! ! !
354199344-82a99e98-cc79-481f-bbb8-a6a3e8295783

@beveradb

@beveradb
Copy link
Collaborator

Thank you so much @Lixi20 - this fix is now live in audio-separator version 0.19.2 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants