Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving and Loading Fine Tuned Model #34

Open
maggieelkin opened this issue Jan 10, 2022 · 0 comments
Open

Saving and Loading Fine Tuned Model #34

maggieelkin opened this issue Jan 10, 2022 · 0 comments

Comments

@maggieelkin
Copy link

I'm not sure if I'm either not saving or loading the finetuned model correctly. After fine-tuning and running an evaluation script, the accuracy before loading model and after loading model are the exact same. I've tried this with my own data, but for a check, I was attempting to replicate the example here: https://bio-transformers.readthedocs.io/en/latest/tutorial/finetuning.html

Training Script:

import biodatasets
import numpy as np
from biotransformers import BioTransformers
import ray

data = biodatasets.load_dataset("swissProt")
X, y = data.to_npy_arrays(input_names=["sequence"])
X = X[0]

# Train on small sequence
length = np.array(list(map(len, X))) < 200
train_seq = X[length][:10000]
val_seq = X[length][10000:15000]

ray.init()
bio_trans = BioTransformers("esm1_t6_43M_UR50S", num_gpus=4)


bio_trans.finetune(
    train_seq,
    validation_sequences=val_seq,
    lr=1.0e-5,
    warmup_init_lr=1e-7,
    toks_per_batch=2000,
    epochs=20,
    acc_batch_size=256,
    warmup_updates=1024,
    accelerator="ddp",
    checkpoint=None,
    save_last_checkpoint=False,
amp_level=None
)

After running it the logs directory is created with hparams.yaml (is empty), metrics.csv and checkpoints folder with last checkpoint (epoch=19-step=39.ckpt).

Then I run the evaluation script:

import biodatasets
import numpy as np
from biotransformers import BioTransformers
import ray

data = biodatasets.load_dataset("swissProt")
X, y = data.to_npy_arrays(input_names=["sequence"])
X = X[0]

# Train sequence with length less than 200 AA
# Test on sequence that was not used for training.
length = np.array(list(map(len, X))) < 200
train_seq = X[length][15000:20000]


ray.init()
bio_trans = BioTransformers("esm1_t6_43M_UR50S", num_gpus=4)
acc_before = bio_trans.compute_accuracy(train_seq, batch_size=32)
print(f"Accuracy before finetuning : {acc_before}")


bio_trans.load_model("logs/finetune_masked/version_0/checkpoints/epoch=19-step=39.ckpt")
acc_after = bio_trans.compute_accuracy(train_seq, batch_size=32)
print(f"Accuracy after finetuning : {acc_after}")

Which outputs:

Accuracy before finetuning : 0.3469025194644928
Accuracy after finetuning : 0.3469025194644928

Am I saving or loading incorrectly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant