Machine learning project to help practice correct Mandarin Chinese tone pronunciation.
The Tone Perfect dataset from Michigan State University is used for the training. It includes the full catalog of monosyllabic sounds in Mandarin Chinese in all four tones, spoken by six native Mandarin speakers. The collection is comprised of about 10k samples with a total duration of approximately 2 hours. To create a more diverse and realistic training sample, the original dataset is augmented. The audio samples are then converted to Mel spectrograms as input for image classification algorithms.
A CNN and a fine-tuned Vision Transformer (based on Google's vit-base-patch16-224) model are trained. Both achieve an accuracy of >99.9% on a statistically independent test dataset. More details and deployed models for inference can be found here: https://pingulino.vercel.app/
conda create --name chineseTones_env python=3.8
conda activate chineseTones_env
pip install jupyter
pip install requests numpy matplotlib librosa pandas seaborn tensorflow boto3
pip install gTTS
pip install soundfile
pip install tensorflow-macos
pip install tensorflow-metal
pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip install transformers
pip install -U huggingface_hub
pip install accelerate -U
pip install tensorboard
pip install peft
cd prepareData
python downloadTonesData.py
cd prepareData
analyzeTones.ipynb
cd trainML
python trainModel.py --addNoise --augmentData --epochs=10 --nHiddenLayers=3 --image_resolution=128 --batch_size=64 --modelName=tfModelTones_v8
cd trainML
python fineTuneModel.py --addNoise --augmentData --unfreezeLastBaseLayer --epochs=10 --batch_size=64 --modelName=fineTunedModelTones_v1
python fineTuneModel.py --addNoise --augmentData --epochs=1 --batch_size=64 --modelName=fineTunedModelTonesLora_v1 --applyLora
pip install librosa Flask flask-cors pydub
(brew install ffmpeg)
pip install torch torchvision torchaudio
pip install transformers
cd flaskAPI
python spectrum.py
sudo apt-get update
sudo apt-get install ffmpeg libavcodec-extra
sudo apt install emacs
sudo apt install tmux
pip install requests numpy matplotlib pandas seaborn boto3
pip install librosa
TMPDIR=~/tmp/ pip install tensorflow
pip install Flask flask-cors pydub
pip install gunicorn
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/google/vit-base-patch16-224
gunicorn --workers 3 --bind 0.0.0.0:5000 spectrum:app
pip install pypinyin genanki sentencepiece openai
npx create-react-app website
npm install recordrtc react-audio-player react-router-dom
npm install @mui/material @emotion/react @emotion/styled
cd frontend
npm start