Active learning with unparametrized ACE potential #72

siamak-attarian · 2024-08-06T19:31:48Z

Hello,

In the beginning of fitting using "pacemaker" a "target_potential.yaml" file is generated which is an unparametrized ACE potential, but has the same hyperparameters as the finally fitted potential. Does it make sense to use that potential for active sampling using "pace_activeset"? I guess for active sampling with D-optimality, only the ACE features are required which must be accessible from "target_potential.yaml"

I'm trying to get a representative set of structures from a dataset of 10,000 structures. Initially I randomly select 100 structures, make an "input.yaml" file with the desired hyperparameters, then run "pyace" with the flag "--no-fit" so I get the "target_potential.yaml". Then use "pace_activeset" using the initial 100 random structures and the "target_potential.yaml", to get the active set, then loop over the whole data set using a python code that includes "calc.results['gamma']" of the ACE calculator in ASE to get the extrapolation grade of the structures. and continue this procedure until I get to roughly around 1000 structures that represent the whole data set.

Does this make sense? I'm using "target_potential.yaml" instead of fitting a whole new potential at each active learning cycle, to accelerate the whole process.

Thanks

yury-lysogorskiy · 2024-08-06T21:25:44Z

Hi,

You can use pacemaker --dry-run or (-dr) to generate target_potential.yaml. --no-fit will also work.
For active sampling with D-optimality, only the ACE features are required, which must be accessible from target_potential.yaml. Radial function coefficients (crad) are also updated during fitting. In freshly initialized potentials, these crads are initialized as delta-symbols: crad_nlk = delta_nk. However, this can be considered a suboptimal basis set and can still be used for structure selection. I believe it is better than random selection anyway.
You are exactly right about the active set - it can be (approximately) considered as "cumulative maximum", so the whole procedure looks good.
You can use pace_select link to select a (sub)optimal set of structures given the basis (+ active set).

Here is a rough scheme of the procedure. I used this to explain the procedure to another group, but your variation is also correct:
PACE_AL_diag.pdf

Start from the initial collection of structures (for example, Trajectory 0).
Generate the initial version of the potential (i.e., target_potential.yaml).
Use pace_select + init_potential.yaml to select the most representative structures from the initial collection of structures. You can also select the top-N structures with pace_select to limit dataset size.
Fit the potential on these structures to get Gen0.yaml potential. Using the unfitted initial potential, as you suggested, is also fine.
Generate the corresponding active set Gen0.asi.
Then, using gen0.yaml/asi, you can compute the extrapolation grade gamma in the larger pool of candidate structures (Traj 1) to get subset 1, and then use pace_select + gen0.yaml/asi to select the most representative structures again.
Join datasets and repeat.

In this scheme, updating the potential to get gen0/gen1/... is done to update radial functions (via crad) and have a more optimal, representative B-basis. Your suggested scheme is just another approximation method.

Best,
Yury

siamak-attarian · 2024-08-06T21:50:42Z

Thank you so much for this great explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active learning with unparametrized ACE potential #72

Active learning with unparametrized ACE potential #72

siamak-attarian commented Aug 6, 2024

yury-lysogorskiy commented Aug 6, 2024

siamak-attarian commented Aug 6, 2024

Active learning with unparametrized ACE potential #72

Active learning with unparametrized ACE potential #72

Comments

siamak-attarian commented Aug 6, 2024

yury-lysogorskiy commented Aug 6, 2024

siamak-attarian commented Aug 6, 2024