Selection

class mlptrain.training.selection.AbsDiffE(e_thresh: float = 0.1)

Bases: SelectionMethod

__call__(configuration, mlp, **kwargs) → None

Evaluate the true and predicted energies, used to determine if this configuration should be selected.

Parameters:: method_name – Name of the reference method to use

__init__(e_thresh: float = 0.1): Selection method based on the absolute difference between the true and predicted total energies.

property n_backtrack: int: Number of backtracking steps that this selection method should evaluate if the value is ‘too_large’

property select: bool: 10 E_T > |E_predicted - E_true| > E_T

property too_large: bool: |E_predicted - E_true| > 10*E_T

class mlptrain.training.selection.AtomicEnvDistance(descriptor, pca: bool = False, distance_metric: str = 'euclidean', n_neighbors: int = 15)

Bases: SelectionMethod

__init__(descriptor, pca: bool = False, distance_metric: str = 'euclidean', n_neighbors: int = 15)

Selection criteria based on analysis whether the configuration is outlier by outlier_identifier function ———————————————————————– :param descriptor: descriptor used to represent the structures :param pca: whether to do dimensionality reduction by PCA.

As the selected distance_metric may potentially suffer from the curse of dimensionality, the dimensionality reduction step (using PCA) could be applied before calculating the LOF. This would ensure good performance in high-dimensional data space.

Parameters:

arguments (For the other)
function (please see details in the outlier_identifier)

property check: bool: Should we keep checking configurations in the MLP-MD trajectory until the first configuration that will be selected by the selector is found?

property n_backtrack: int: Number of backtracking steps that this selection method should evaluate if the value is ‘too_large’

property select: bool: Should this configuration be selected?

property too_large: bool: Is the error/discrepancy too large to be selected?

class mlptrain.training.selection.AtomicEnvSimilarity(descriptor, threshold: float = 0.999)

Bases: SelectionMethod

__call__(configuration: mlptrain.Configuration, mlp: MLPotential, **kwargs) → None: Evaluate the selection criteria

__init__(descriptor, threshold: float = 0.999): Selection criteria based on the maximum distance between any of the training set and a new configuration. Evaluated based on the similarity SOAP kernel vector (K*) between a new configuration and prior training data

descriptor: Call the descriptor instance with user-defined parameters, eg. SoapDescriptor = SoapDescriptor(average=”outer”, r_cut=6.0, n_max=8, l_max=8) selector = AtomicEnvSimilarity(descriptor=SoapDescriptor, threshold=0.95)

property n_backtrack: int: Number of backtracking steps that this selection method should evaluate if the value is ‘too_large’

property select: bool: Determine if this configuration should be selected, based on the minimum similarity between it and all of the training data

property too_large: bool: Is the error/discrepancy too large to be selected?

class mlptrain.training.selection.SelectionMethod

Bases: ABC

Active learning selection method

NOTE: Should execute in serial

abstract __call__(configuration: mlptrain.Configuration, mlp: MLPotential, **kwargs) → None: Evaluate the selector

__init__(): A selection method should determine whether its configuration should be selected during active learning

property check: bool: Should we keep checking configurations in the MLP-MD trajectory until the first configuration that will be selected by the selector is found?

copy() → SelectionMethod

abstract property n_backtrack: int: Number of backtracking steps that this selection method should evaluate if the value is ‘too_large’

abstract property select: bool: Should this configuration be selected?

abstract property too_large: bool: Is the error/discrepancy too large to be selected?