-
Our team have evaluated several ensemble methods, such as feature concatenation, average, and fusion, on three models (Hubert, Wav2vec2, Torchcrepe).
-
We adopt two methods on the relationship between our scene-embedding and timestamp-embedding models. In "fusion_cat_xwc_time", every certain time inverted is averaged and concatenated. In other models, we simply average three models'(Hubert, Wav2vec2, Torchcrepe) embeddings.
-
The pretrained models used are:
- facebook/hubert-large-ll60k
- facebook/hubert-xlarge-ll60k
- facebook/wav2vec2-large-960h-lv60-self
- torchcrepe
pip install \
git+https://github.com/tony10101105/HEAR-2021-NeurIPS-Challenge---NTU.git
# In python code:
from GURA import fusion_wav2vec2
from GURA import cat_wc
.
.
.
- python3.8
- CUDA: 11.4
- torch: 1.9.1+cu111
- 4.11.3
- 0.0.15