ChineseTonesDetector

Machine learning project to help practice correct Mandarin Chinese tone pronunciation.

The Tone Perfect dataset from Michigan State University is used for the training. It includes the full catalog of monosyllabic sounds in Mandarin Chinese in all four tones, spoken by six native Mandarin speakers. The collection is comprised of about 10k samples with a total duration of approximately 2 hours. To create a more diverse and realistic training sample, the original dataset is augmented. The audio samples are then converted to Mel spectrograms as input for image classification algorithms.

A CNN and a fine-tuned Vision Transformer (based on Google's vit-base-patch16-224) model are trained. Both achieve an accuracy of >99.9% on a statistically independent test dataset. More details and deployed models for inference can be found here: https://pingulino.vercel.app/

Example spectrograms

Train ML

setup environment

conda create --name chineseTones_env python=3.8
conda activate chineseTones_env
pip install jupyter 
pip install requests numpy matplotlib librosa pandas seaborn tensorflow boto3
pip install gTTS
pip install soundfile
pip install tensorflow-macos
pip install tensorflow-metal

pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip install transformers
pip install -U huggingface_hub
pip install accelerate -U
pip install tensorboard
pip install peft

Download data samples

cd prepareData
python downloadTonesData.py

Notebook to explore data samples

cd prepareData
analyzeTones.ipynb

Train CNN model

cd trainML
python trainModel.py --addNoise --augmentData --epochs=10 --nHiddenLayers=3 --image_resolution=128 --batch_size=64 --modelName=tfModelTones_v8

Train fine-tuned ViT model

cd trainML
python fineTuneModel.py --addNoise --augmentData --unfreezeLastBaseLayer --epochs=10 --batch_size=64 --modelName=fineTunedModelTones_v1
python fineTuneModel.py --addNoise --augmentData --epochs=1 --batch_size=64 --modelName=fineTunedModelTonesLora_v1 --applyLora

ML API (local)

setup environment

pip install librosa Flask flask-cors pydub
(brew install ffmpeg)
pip install torch torchvision torchaudio
pip install transformers

run

cd flaskAPI
python spectrum.py

ML API (AWS EC2 Ubuntu)

setup environment

sudo apt-get update
sudo apt-get install ffmpeg libavcodec-extra
sudo apt  install emacs
sudo apt  install tmux
pip install requests numpy matplotlib pandas seaborn boto3
pip install  librosa
TMPDIR=~/tmp/ pip install tensorflow
pip install Flask flask-cors pydub
pip install gunicorn

get local copy of base ViT model

sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/google/vit-base-patch16-224

run

gunicorn --workers 3 --bind 0.0.0.0:5000 spectrum:app

Generate vocabulary list for anki

pip install pypinyin genanki sentencepiece openai

Website

setup environment

npx create-react-app website
npm install recordrtc react-audio-player react-router-dom
npm install @mui/material @emotion/react @emotion/styled

run

cd frontend
npm start

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChineseTonesDetector

Example spectrograms

Train ML

setup environment

Download data samples

Notebook to explore data samples

Train CNN model

Train fine-tuned ViT model

ML API (local)

setup environment

run

ML API (AWS EC2 Ubuntu)

setup environment

get local copy of base ViT model

run

Generate vocabulary list for anki

Website

setup environment

run

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
flaskAPI		flaskAPI
prepareData		prepareData
trainML		trainML
vocabulary		vocabulary
website		website
.gitignore		.gitignore
README.md		README.md

phdargen/chineseTonesDetector

Folders and files

Latest commit

History

Repository files navigation

ChineseTonesDetector

Example spectrograms

Train ML

setup environment

Download data samples

Notebook to explore data samples

Train CNN model

Train fine-tuned ViT model

ML API (local)

setup environment

run

ML API (AWS EC2 Ubuntu)

setup environment

get local copy of base ViT model

run

Generate vocabulary list for anki

Website

setup environment

run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages