Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

atoultaro/podcast_highlight

Repository files navigation

Podcast highlight extraction

Installation

The repo was developed on MacBook M1 chip laptop. Thus, Python 3.8 and most packages, as well as the virtual environment, are managed using Conda. The list of packages is shown in the file requirements.txt.

Problem

sentencepiece might not be able to be import on Apple M1

arch -arm64 brew install cmake pip install --no-cache-dir sentencepiece

Data

Sound data are stored under the directory ./sound

Step 1: Speaker segmentation (find_speaker_segment.py)

Using SpeechBrain's speaker embeddings from the HuggingFace repo of "spkrec-ecapa-voxceleb". See https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb

Step 2: Audio tagging by AudioSet (assign_audioset_labels.py)

Step 3: Find highlight through the combination of sentence embedding (BERT embeddings) and music presence

About

Extract highlight from the podcast

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published