GitHub - bboycoi/End-to-End-Deep-Neural-Network-ASR: An End-to-End Deep Neural Network ASR

Project Overview

In this project builds a deep neural network that functions end-to-end automatic speech recognition (ASR) pipeline!

This is an ongoing project, we are adding language model to the pipeline.

Project Instructions

The Notebook vui.ibnpy is the Main procedure, it is self-explained and is a good place to start.

Local Setup

You should run this project with GPU acceleration for best performance.

Install TensorFlow.
- Option 1: To install TensorFlow with GPU support, follow the guide to install the necessary NVIDIA software on your system. If you are using an EC2 GPU instance, you can skip this step and only need to install the tensorflow-gpu package:
```
pip install tensorflow-gpu==1.1.0
```
- Option 2: To install TensorFlow with CPU support only,
```
pip install tensorflow==1.1.0
```
Install a few Requires packages.

pip install -r requirements.txt

Switch Keras backend to TensorFlow.

Linux or Mac:

KERAS_BACKEND=tensorflow python -c "from keras import backend"

Obtain the libav package.
- Linux: sudo apt-get install libav-tools or sudo apt install ffmpeg #requirement to run avahi wget http://launchpadlibrarian.net/348889634/libav-tools_3.4.1-1_all.deb sudo dpkg -i libav-tools_3.4.1-1_all.deb
Obtain the appropriate dataset, and convert all flac files to wav format. This works with data directories that are organized like LibriSpeech: data_directory/group/speaker/[file_id1.wav, file_id2.wav, ..., speaker.trans.txt] Where speaker.trans.txt has in each line, file_id transcription
- Linux or Mac:
```
mv flac_to_wav.sh $data_folder$
cd $data_folder$
./flac_to_wav.sh
```
Create JSON files corresponding to the train and validation datasets.

cd ..
python create_desc_json.py $data_folder$ train_corpus.json
python create_desc_json.py $data_folder$ valid_corpus.json

TODO!

(1) Add a Language Model to the Decoder

The performance of the decoding step can be greatly enhanced by incorporating a language model.

(2) Try out Different Audio Features

Train a network that uses raw audio waveforms!

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
images		images
results		results
.gitignore		.gitignore
README.md		README.md
char_map.py		char_map.py
create_desc_json.py		create_desc_json.py
data_generator.py		data_generator.py
flac_to_wav.sh		flac_to_wav.sh
generate_Text_Train.py		generate_Text_Train.py
requirements.txt		requirements.txt
sample_models.py		sample_models.py
softmax.npy		softmax.npy
train_utils.py		train_utils.py
utils.py		utils.py
vui.ipynb		vui.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Project Instructions

Local Setup

TODO!

(1) Add a Language Model to the Decoder

(2) Try out Different Audio Features

About

Releases

Packages

Languages

bboycoi/End-to-End-Deep-Neural-Network-ASR

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Project Instructions

Local Setup

TODO!

(1) Add a Language Model to the Decoder

(2) Try out Different Audio Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages