Fine-tunning BETO for emoji-prediction
🤗 huggingface.co/ccarvajal/beto-emoji
It requires the installation of pytorch, which depends on the system and whether there's a GPU. The library transformers. For the rest, run
pip install -r requirements.txt
Details with training and a use example are shown in github.com/camilocarvajalreyes/beto-emoji. A deeper analysis of this and other models on the full dataset can be found in github.com/furrutiav/data-mining-2022. We have used this model for a project for CC5205 Data Mining course.
The Multilingual Emoji Prediction dataset (Barbieri et al. 2010) consists of tweets in English and Spanish that originally had a single emoji, which is later used as a tag. Test and trial sets can be downloaded here, but the train set needs to be downloaded using a twitter crawler. The goal is to predict that single emoji that was originally in the tweet using the text in it (out of a fixed set of possible emojis, 20 for English and 19 for Spanish).
Training parameters:
training_args = TrainingArguments(
output_dir="./results",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=5,
weight_decay=0.01
)