Steps to reproduce:
- Clone the repository with shared task data
https://github.com/sigtyp/ST2024.git
- Install requirements (in a virtual environment)
pip install -r requirements.txt
- Convert data for training
python convert_lemmatisation.py
python convert_mlm.py
python convert_mlm_decomposed.py
python convert_tagging.py
- Train custom tokenizers
- Train the models
- Make predictions