Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 2.13 KB

README.md

File metadata and controls

34 lines (25 loc) · 2.13 KB

beto-emoji

Fine-tunning BETO for emoji-prediction

HuggingFace

🤗 huggingface.co/ccarvajal/beto-emoji

Installation

It requires the installation of pytorch, which depends on the system and whether there's a GPU. The library transformers. For the rest, run

pip install -r requirements.txt

Repository

Details with training and a use example are shown in github.com/camilocarvajalreyes/beto-emoji. A deeper analysis of this and other models on the full dataset can be found in github.com/furrutiav/data-mining-2022. We have used this model for a project for CC5205 Data Mining course.

Notebooks

Reproducibility

The Multilingual Emoji Prediction dataset (Barbieri et al. 2010) consists of tweets in English and Spanish that originally had a single emoji, which is later used as a tag. Test and trial sets can be downloaded here, but the train set needs to be downloaded using a twitter crawler. The goal is to predict that single emoji that was originally in the tweet using the text in it (out of a fixed set of possible emojis, 20 for English and 19 for Spanish).

Training parameters:

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01
)