You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if I'm either not saving or loading the finetuned model correctly. After fine-tuning and running an evaluation script, the accuracy before loading model and after loading model are the exact same. I've tried this with my own data, but for a check, I was attempting to replicate the example here: https://bio-transformers.readthedocs.io/en/latest/tutorial/finetuning.html
Training Script:
import biodatasets
import numpy as np
from biotransformers import BioTransformers
import ray
data = biodatasets.load_dataset("swissProt")
X, y = data.to_npy_arrays(input_names=["sequence"])
X = X[0]
# Train on small sequence
length = np.array(list(map(len, X))) < 200
train_seq = X[length][:10000]
val_seq = X[length][10000:15000]
ray.init()
bio_trans = BioTransformers("esm1_t6_43M_UR50S", num_gpus=4)
bio_trans.finetune(
train_seq,
validation_sequences=val_seq,
lr=1.0e-5,
warmup_init_lr=1e-7,
toks_per_batch=2000,
epochs=20,
acc_batch_size=256,
warmup_updates=1024,
accelerator="ddp",
checkpoint=None,
save_last_checkpoint=False,
amp_level=None
)
After running it the logs directory is created with hparams.yaml (is empty), metrics.csv and checkpoints folder with last checkpoint (epoch=19-step=39.ckpt).
Then I run the evaluation script:
import biodatasets
import numpy as np
from biotransformers import BioTransformers
import ray
data = biodatasets.load_dataset("swissProt")
X, y = data.to_npy_arrays(input_names=["sequence"])
X = X[0]
# Train sequence with length less than 200 AA
# Test on sequence that was not used for training.
length = np.array(list(map(len, X))) < 200
train_seq = X[length][15000:20000]
ray.init()
bio_trans = BioTransformers("esm1_t6_43M_UR50S", num_gpus=4)
acc_before = bio_trans.compute_accuracy(train_seq, batch_size=32)
print(f"Accuracy before finetuning : {acc_before}")
bio_trans.load_model("logs/finetune_masked/version_0/checkpoints/epoch=19-step=39.ckpt")
acc_after = bio_trans.compute_accuracy(train_seq, batch_size=32)
print(f"Accuracy after finetuning : {acc_after}")
Which outputs:
Accuracy before finetuning : 0.3469025194644928
Accuracy after finetuning : 0.3469025194644928
Am I saving or loading incorrectly?
The text was updated successfully, but these errors were encountered:
I'm not sure if I'm either not saving or loading the finetuned model correctly. After fine-tuning and running an evaluation script, the accuracy before loading model and after loading model are the exact same. I've tried this with my own data, but for a check, I was attempting to replicate the example here: https://bio-transformers.readthedocs.io/en/latest/tutorial/finetuning.html
Training Script:
After running it the logs directory is created with hparams.yaml (is empty), metrics.csv and checkpoints folder with last checkpoint (epoch=19-step=39.ckpt).
Then I run the evaluation script:
Which outputs:
Am I saving or loading incorrectly?
The text was updated successfully, but these errors were encountered: