MusicECAN: Automatic Recorded Music Denoising Network with Efficient Channel Attention

Abstract

In this work, we address the long-standing problem of automatic recorded music denoising. Previous audio denoising works focus on speech focused on speech primarily instead of music, neglecting the scenario of amateur music recording. To this end, we first propose MusicECAN, an automatic recorded music denoising network designed to enhance the quality of recorded music. The novel architecture comprises two key components, namely a feature learning module and a noise filtering module, which can model, refine and denoise the noisy input efficiently yet effectively. Specifically, in order to capture sufficient noisy music information, an ECA-U-SAM based feature learning module is designed by introducing an Efficient Channel Attention (ECA) mechanism in traditional U-net with a Supervised Attention Module (SAM). For the training of our MusicECAN, we collect M&N, a dataset containing various recordings of clean music and noise. Through the combination of different clean and noise recording pairs, we can effectively simulate possible environments of music performances with different background noise. Extensive quantitative and qualitative comparisons demonstrate that our MusicECAN outperforms the state-of-the-art audio denoising methods.

Demo Video

Watch the demo video.

Listening more denoising examples.

M&N: A music dataset for denoising music recordings in wild

We introduce a dataset M&N, which can effectively meet the requirements of music denoising for recordings in the wild. The dataset comprises various videos and recordings of clean music and noise assembled from free sound effects website and existing cross-modal audio generation dataset FAIR-PLAY. For video data, we separate the visual and audio tracks of the video. We anticipate that the dataset will be useful for denoising task and also serve as ground-truth for evaluating performances.

For music data, we collect totally 3.43 hours of clean music recordings in wav format with a sampling rate of 44.1 kHz and bit depth of 16 bits, mono channel. There are 9 categories of music recordings: piano, drum kit, harp, cello, Chinese lute, trumpet, Chinese zither, multi-instrument and song.

Download clean music recordings metadata

Download clean music recordings except from FAIR-PLAY

It is worth noting that the clean music recordings we provide here are each 10 seconds long to ensure that researchers who need them can crop them to different lengths, such as five seconds, as needed.

For noise data, we collect totally 1000 seconds of noise recordings in wav format with a sampling rate of 44.1 kHz. According to audio content, the noise data is divided into five categories:

Electrical noise: Recordings of electrical circuit noise such as clicking, hissing noise and crackling noise caused by the irregularities in the storage medium. This kind of noise often occurs when the user is recording music in a relatively quiet room while the device is malfunctioning.
Crowd noise: Recordings include the sound of crowd chatter, cheering and children's laughter, etc. Such noise often exists when users record live music in crowded venues such as shopping malls, theaters, or plazas.

Weather noise: Recordings include the sound of rain, wind, thunder and other weather sounds. It is common for these types of noises to be heard when users record music in non-sound-insulating places.
Traffic noise: Recordings include vehicle start-up sounds, road traffic sounds, motorcycle sounds, etc., which are used to simulate the traffic noise when the user is recording audio near the driveway.
Stationary noise: Recordings of random noise for which the probability that the noise voltage lies within any given interval does not change with time, such as white noise.

Download noise recording recordings metadata

Download noise recordings

Ealuation

Prepare datasets.

Please change your paths of pure performance music and pure noise files respectively at the end of the makedataset.py and execute the following commands.

python makedataset.py

Ealuation.

python test_eca.py

A set of pretrained weights can be found at experiments.

Acknowledgement

We borrowed a lot of code from A two-stage U-Net for high-fidelity denoising of historical recordings. Thanks for their great works. Please also cite their nice work if you use this code.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
conf		conf
data		data
dist		dist
image		image
media		media
README.md		README.md
dataset_loader.py		dataset_loader.py
index.html		index.html
inference.py		inference.py
makedataset.py		makedataset.py
requirements.txt		requirements.txt
test_eca.py		test_eca.py
unet_eca.py		unet_eca.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MusicECAN: Automatic Recorded Music Denoising Network with Efficient Channel Attention

Abstract

Demo Video

M&N: A music dataset for denoising music recordings in wild

Ealuation

Acknowledgement

About

Releases

Packages

Languages

slliugit/slliugit.github.io

Folders and files

Latest commit

History

Repository files navigation

MusicECAN: Automatic Recorded Music Denoising Network with Efficient Channel Attention

Abstract

Demo Video

M&N: A music dataset for denoising music recordings in wild

Ealuation

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages