Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active learning with unparametrized ACE potential #72

Open
siamak-attarian opened this issue Aug 6, 2024 · 2 comments
Open

Active learning with unparametrized ACE potential #72

siamak-attarian opened this issue Aug 6, 2024 · 2 comments

Comments

@siamak-attarian
Copy link

Hello,

In the beginning of fitting using "pacemaker" a "target_potential.yaml" file is generated which is an unparametrized ACE potential, but has the same hyperparameters as the finally fitted potential. Does it make sense to use that potential for active sampling using "pace_activeset"? I guess for active sampling with D-optimality, only the ACE features are required which must be accessible from "target_potential.yaml"

I'm trying to get a representative set of structures from a dataset of 10,000 structures. Initially I randomly select 100 structures, make an "input.yaml" file with the desired hyperparameters, then run "pyace" with the flag "--no-fit" so I get the "target_potential.yaml". Then use "pace_activeset" using the initial 100 random structures and the "target_potential.yaml", to get the active set, then loop over the whole data set using a python code that includes "calc.results['gamma']" of the ACE calculator in ASE to get the extrapolation grade of the structures. and continue this procedure until I get to roughly around 1000 structures that represent the whole data set.

Does this make sense? I'm using "target_potential.yaml" instead of fitting a whole new potential at each active learning cycle, to accelerate the whole process.

Thanks

@yury-lysogorskiy
Copy link
Member

Hi,

  • You can use pacemaker --dry-run or (-dr) to generate target_potential.yaml. --no-fit will also work.
  • For active sampling with D-optimality, only the ACE features are required, which must be accessible from target_potential.yaml. Radial function coefficients (crad) are also updated during fitting. In freshly initialized potentials, these crads are initialized as delta-symbols: crad_nlk = delta_nk. However, this can be considered a suboptimal basis set and can still be used for structure selection. I believe it is better than random selection anyway.
  • You are exactly right about the active set - it can be (approximately) considered as "cumulative maximum", so the whole procedure looks good.
  • You can use pace_select link to select a (sub)optimal set of structures given the basis (+ active set).

Here is a rough scheme of the procedure. I used this to explain the procedure to another group, but your variation is also correct:
PACE_AL_diag.pdf

  1. Start from the initial collection of structures (for example, Trajectory 0).
  2. Generate the initial version of the potential (i.e., target_potential.yaml).
  3. Use pace_select + init_potential.yaml to select the most representative structures from the initial collection of structures. You can also select the top-N structures with pace_select to limit dataset size.
  4. Fit the potential on these structures to get Gen0.yaml potential. Using the unfitted initial potential, as you suggested, is also fine.
  5. Generate the corresponding active set Gen0.asi.
  6. Then, using gen0.yaml/asi, you can compute the extrapolation grade gamma in the larger pool of candidate structures (Traj 1) to get subset 1, and then use pace_select + gen0.yaml/asi to select the most representative structures again.
  7. Join datasets and repeat.

In this scheme, updating the potential to get gen0/gen1/... is done to update radial functions (via crad) and have a more optimal, representative B-basis. Your suggested scheme is just another approximation method.

Best,
Yury

@siamak-attarian
Copy link
Author

Thank you so much for this great explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants