Replies: 1 comment
-
(or if the per-species rescale is being done in "dataset units", would it be enough to change the global rescale and average numbers based on the statistics of the new data set so long as the same set of atomic types were used, or something similar?) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We're investigating using a pretrained Allegro/Nequip model to continue training with new training data and were wondering if such a thing were possible and if it is what hazards may be present.
A few things that stand out is that some aspects of the model are initialized from the initial dataset and changed in the config (avg_num_neighbors and avg_num_atoms come to mind). As well, the PerSpeciesRescale and RescaleEnergyEtc builders clearly use data obtained by statistical analysis of the initial training dataset, which makes sense given this from the paper:
But what if the full training set is unknown at model creation? For example, suppose we had 1000 molecules with known energies and train a model, then load that model add 1000 new molecules to the training set for additional training? Aside from the "normal" caveats (forgetting, etc) how would this "initialization" be affected?
Or would this be really impossible and is the only reasonable path forward training the new model from scratch with the new data as a full "initial training set" each time we add data?
Beta Was this translation helpful? Give feedback.
All reactions