Skip to content

bboycoi/End-to-End-Deep-Neural-Network-ASR

Repository files navigation

Project Overview

In this project builds a deep neural network that functions end-to-end automatic speech recognition (ASR) pipeline!

This is an ongoing project, we are adding language model to the pipeline.

ASR Pipeline

Project Instructions

The Notebook vui.ibnpy is the Main procedure, it is self-explained and is a good place to start.

Local Setup

You should run this project with GPU acceleration for best performance.

  1. Install TensorFlow.

    • Option 1: To install TensorFlow with GPU support, follow the guide to install the necessary NVIDIA software on your system. If you are using an EC2 GPU instance, you can skip this step and only need to install the tensorflow-gpu package:
    pip install tensorflow-gpu==1.1.0
    
    • Option 2: To install TensorFlow with CPU support only,
    pip install tensorflow==1.1.0
    
  2. Install a few Requires packages.

pip install -r requirements.txt
  1. Switch Keras backend to TensorFlow.

    • Linux or Mac:
    KERAS_BACKEND=tensorflow python -c "from keras import backend"
    
  2. Obtain the libav package.

    • Linux: sudo apt-get install libav-tools or sudo apt install ffmpeg #requirement to run avahi wget http://launchpadlibrarian.net/348889634/libav-tools_3.4.1-1_all.deb sudo dpkg -i libav-tools_3.4.1-1_all.deb
  3. Obtain the appropriate dataset, and convert all flac files to wav format. This works with data directories that are organized like LibriSpeech: data_directory/group/speaker/[file_id1.wav, file_id2.wav, ..., speaker.trans.txt] Where speaker.trans.txt has in each line, file_id transcription

    • Linux or Mac:
    mv flac_to_wav.sh $data_folder$
    cd $data_folder$
    ./flac_to_wav.sh
    
  4. Create JSON files corresponding to the train and validation datasets.

cd ..
python create_desc_json.py $data_folder$ train_corpus.json
python create_desc_json.py $data_folder$ valid_corpus.json

TODO!

(1) Add a Language Model to the Decoder

The performance of the decoding step can be greatly enhanced by incorporating a language model.

(2) Try out Different Audio Features

Train a network that uses raw audio waveforms!

About

An End-to-End Deep Neural Network ASR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published