Python

This list is incomplete. You can help by expanding it!

Atoms and cells

Some base functionality for working with structures and collections thereof in librascal

class rascal.neighbourlist.structure_manager.AtomsList(frames, nl_options, start=None, length=None, managers=None)[source]

A container for the neighbourlist and representation data associated with a list of atomic structures.

This is a wrapper class for the StructureManagerCollection that have between precompiled on the C++ side.

nl_options

Parameters for each layer of the wrapped structure manager. Parameters can be specified for these layers: center, neighbourlist and strict.

Type: dict

managers

C++ object from rascal that holds the neighbourlist and the data associated with representations.

Type: StructureManagerCollection

rascal.neighbourlist.structure_manager.mask_center_atoms_by_id(frame, id_select=None, id_blacklist=None)[source]

Mask the centers (center-select) of an ASE atoms object, by index

Parameters

frame (ase.Atoms) – Atomic structure to mask
id_select (list of int) – List of atom IDs to select
id_blacklist (list of int) – List of atom IDs to exclude

Returns

Return type

None (the Atoms object is modified directly)

Notes

The default is to select all atoms. If id_select is provided, select only those atoms. If only id_blacklist is provided, select all atoms except those in the blacklist. If both are provided, atoms are first selected based on id_select and then excluded based on id_blacklist. If the atoms object already has a mask, then id_select is applied first using the or operation, then id_blacklist is applied using the and not operation (so the order of precedence is: blacklist, selection, previous mask).

This logic allows this function to be combined with mask_center_atoms_by_species to allow mixed species/id-based masking.

rascal.neighbourlist.structure_manager.mask_center_atoms_by_species(frame, species_select=[], species_blacklist=[])[source]

Mask the centers of an ASE atoms object, by atomic species

Parameters

frame (ase.Atoms) – Atomic structure to mask
species_select (list of int or str) – List of atomic numbers, or species symbols, to select. Should be of consistent type across list.
species_blacklist (list of int or str) – List of atomic numbers, or species symbols, to exclude. Should be of consistent type across list.

Returns

Return type

None (the Atoms object is modified directly)

Notes

The default is to select all atoms. If species_select is provided, select only those atoms whose species is in the list. If only species_blacklist is provided, select all atoms except those whose species is in the blacklist. If both are provided, atoms are first selected based on species_select and then excluded based on species_blacklist. If the atoms object already has a mask, then species_select is applied first using the or operation, then species_blacklist is applied using the and not operation (so the order of precedence is: blacklist, selection, previous mask).

This logic allows this function to be combined with mask_center_atoms_by_id to allow mixed species/id-based masking.

Representations

Representations are the primary classes in librascal used to compute structural representations (features, descriptors) from a list of atoms.

class rascal.representations.SortedCoulombMatrix(cutoff, sorting_algorithm='row_norm', size=10, central_decay=- 1, interaction_cutoff=10, interaction_decay=- 1)[source]

Computes the Sorted Coulomb matrix representation [1].

cutoff

Type: float

central_decay

The distance over which the the coulomb interaction decays from full to none.

Type: float

interaction_cutoff

The distance between two non-central atom, where the coulomb interaction element will be zero.

Type: float

interaction_decay

The distance over which the the coulomb interaction decays from full to none.

Type: float

size

Larger or equal to the maximum number of neighbour an atom has in the structure.

Type: int

transform(frames)[source]: Compute the representation for a list of ase.Atoms object.

1: Rupp, M., Tkatchenko, A., Müller, K.-R., & von Lilienfeld, O. A. (2011). Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Physical Review Letters, 108(5), 58301. https://doi.org/10.1103/PhysRevLett.108.058301

transform(frames)[source]

Compute the representation.

Parameters: frames (list(ase.Atoms) or AtomsList) – List of atomic structures.
Returns: AtomsList
Return type: Object containing the representation

update_hyperparameters(**hypers)[source]

Store the given dict of hyperparameters

Also updates the internal json-like representation

class rascal.representations.SphericalInvariants(interaction_cutoff, cutoff_smooth_width, max_radial, max_angular, gaussian_sigma_type, gaussian_sigma_constant=0.3, cutoff_function_type='ShiftedCosine', soap_type='PowerSpectrum', inversion_symmetry=True, radial_basis='GTO', normalize=True, optimization=None, optimization_args=None, expansion_by_species_method='environment wise', global_species=None, compute_gradients=False, cutoff_function_parameters={}, coefficient_subselection=None)[source]

Computes a SphericalInvariants representation, i.e. the SOAP power spectrum.

interaction_cutoff

Maximum pairwise distance for atoms to be considered in expansion

Type: float

cutoff_smooth_width

The distance over which the the interaction is smoothed to zero

Type: float

max_radial

Number of radial basis functions

Type: int

max_angular

Highest angular momentum number (l) in the expansion

Type: int

gaussian_sigma_type

How the Gaussian atom sigmas (smearing widths) are allowed to vary. Only fixed smearing width (‘Constant’) are implemented.

Type: str

gaussian_sigma_constant

Specifies the atomic Gaussian widths, in the case where they’re fixed.

Type: float

cutoff_function_type

Choose the type of smooth cutoff function used to define the local environment. Can be either ‘ShiftedCosine’ or ‘RadialScaling’.

If ‘ShiftedCosine’, the functional form of the switching function is:

\[\begin{split}sc(r) = \begin{cases} 1 &r < r_c - sw,\\ 0.5 + 0.5 \cos(\pi * (r - r_c + sw) / sw) &r_c - sw < r <= r_c, \\ 0 &r_c < r, \end{cases}\end{split}\]

where \(r_c\) is the interaction_cutoff and \(sw\) is the cutoff_smooth_width.

If ‘RadialScaling’, the functional form of the switching function is as expressed in equation 21 of https://doi.org/10.1039/c8cp05921g:

\[rs(r) = sc(r) u(r),\]

where

\[\begin{split}u(r) = \begin{cases} \frac{1}{(r/r_0)^m} &\text{if c=0,}\\ 1 &\text{if m=0,} \\ \frac{c}{c+(r/r_0)^m} &\text{else}, \end{cases}\end{split}\]

where \(c\) is the rate, \(r_0\) is the scale, \(m\) is the exponent.

Type: string

soap_type

Specifies the type of representation to be computed (“RadialSpectrum”, “PowerSpectrum” and “BiSpectrum”).

Type: string

inversion_symmetry

Specifies whether inversion invariance should be enforced. (Only relevant for BiSpectrum.)

Type: boolean

radial_basis

Specifies the type of radial basis R_n to be computed (“GTO” for Gaussian typed orbitals and “DVR” discrete variable representation using Gaussian-Legendre quadrature rule)

Type: string

normalize

Whether to normalize so that the kernel between identical environments is 1. Default and highly recommended: True.

Type: boolean

optimization

Optional arguments for optimization of the computation of spherical expansion coefficients. “Spline” and “RadialDimReduction” are available.

Spline: Enables cubic splining for the radial basis functions.

accuracyfloat
accuracy of the cubic spline

RadialDimReduction: Projection matrices to optimize radial basis,

requires Spline to be set

projection_matricesdict: Contains or each species a list of projection matrices for each angular channel. A projection matrix for an angular channel has the shape (max_radial, expanded_max_radial). A number of expanded_max_radial radial basis are computed to be then projected to max_radial radial basis. The projected radial basis is then splined for each species and angular channel

Default settings is using spline

Type: dict, default None

expansion_by_species_method

Specifies the how the species key of the invariant are set-up. Possible values: ‘environment wise’, ‘user defined’, ‘structure wise’. The descriptor is computed for each atomic enviroment and it is indexed using tuples of atomic species that are present within the environment. This index is by definition sparse since a species tuple will be non zero only if the atomic species are present inside the environment. ‘environment wise’ means that each environmental representation will only contain the minimal set of species tuples needed by each atomic environment. ‘structure wise’ means that within a structure the species tuples will be the same for each environment coefficients. ‘user defined’ uses global_species to set-up the species tuples.

These different settings correspond to different trade-off between the memory efficiency of the invariants and the computational efficiency of the kernel computation. When computing a kernel using ‘environment wise’ setting does not allow for efficent matrix matrix multiplications which is ensured when ‘user defined’ is used. ‘structure wise’ is a balance between the memory footprint and the use of matrix matrix products.

Note that the sparsity of the gradient coefficients and their use to build kernels does not allow for clear efficiency gains so their sparsity is kept irrespective of expansion_by_species_method.

Type: string

global_species

list of species (specified with their atomic number) to use to set-up the species key of the invariant. It should contain all the species present in the structure for which invariants will be computed

Type: list of int

compute_gradients

control the computation of the representation’s gradients w.r.t. atomic positions.

Type: bool

cutoff_function_parameters

Additional parameters for the cutoff function. if cutoff_function_type == ‘RadialScaling’ then it should have the form

dict(rate=...,
     scale=...,
     exponent=...)

where ... should be replaced by the desired positive float.

Type: dict

coefficient_subselection

if None then all the coefficients are computed following max_radial, max_angular and the atomic species present. if soap_type == 'PowerSpectrum' and it has the form {'a': [...], 'b': [...], 'n1': [...], 'n2': [...], 'l': [...]} where ‘a’ and ‘b’ are lists of atomic species, ‘n1’ and ‘n2’ are lists of radial expension indices and ‘l’ is the list of spherical expansion index. Each of these lists have the same size and their ith element refers to one PowerSpectrum coefficient that will be computed. utils.FPSFilter and utils.CURFilter with act_on set to feature output such dictionary.

Type: list or None

transform(frames)[source]: Compute the representation for a list of ase.Atoms object.

soap: Bartók, Kondor, and Csányi, “On representing chemical environments”, Phys. Rev. B. 87(18), p. 184115 http://link.aps.org/doi/10.1103/PhysRevB.87.184115

get_keys(species)[source]: return the proper list of keys used to build the representation

get_num_coefficients(n_species=1)[source]

Return the number of coefficients in the spherical invariants

(this is the descriptor size per atomic centre)

transform(frames)[source]

Compute the representation.

Parameters: frames (list(ase.Atoms) or AtomsList) – List of atomic structures.
Returns: AtomsList
Return type: Object containing the representation

update_hyperparameters(**hypers)[source]

Store the given dict of hyperparameters

Also updates the internal json-like representation

class rascal.representations.SphericalCovariants(interaction_cutoff, cutoff_smooth_width, max_radial, max_angular, gaussian_sigma_type, gaussian_sigma_constant=0.3, cutoff_function_type='ShiftedCosine', normalize=True, radial_basis='GTO', optimization=None, optimization_args=None, soap_type='LambdaSpectrum', inversion_symmetry=True, covariant_lambda=0, cutoff_function_parameters={})[source]

Computes a SphericalCovariants representation, i.e. lambda spectrum.

interaction_cutoff

Maximum pairwise distance for atoms to be considered in expansion

Type: float

cutoff_smooth_width

The distance over which the the interaction is smoothed to zero

Type: float

max_radial

Number of radial basis functions

Type: int

max_angular

Highest angular momentum number (l) in the expansion

Type: int

gaussian_sigma_type

How the Gaussian atom sigmas (smearing widths) are allowed to vary. Only fixed smearing width (‘Constant’) are implemented.

Type: str

gaussian_sigma_constant

Specifies the atomic Gaussian widths, in the case where they’re fixed.

Type: float

cutoff_function_type

Choose the type of smooth cutoff function used to define the local environment. Can be either ‘ShiftedCosine’ or ‘RadialScaling’.

If ‘ShiftedCosine’, the functional form of the switching function is:

\[\begin{split}sc(r) = \begin{cases} 1 &r < r_c - sw,\\ 0.5 + 0.5 \cos(\pi * (r - r_c + sw) / sw) &r_c - sw < r <= r_c, \\ 0 &r_c < r, \end{cases}\end{split}\]

where \(r_c\) is the interaction_cutoff and \(sw\) is the cutoff_smooth_width.

If ‘RadialScaling’, the functional form of the switching function is as expressed in equation 21 of https://doi.org/10.1039/c8cp05921g:

\[rs(r) = sc(r) u(r),\]

where

\[\begin{split}u(r) = \begin{cases} \frac{1}{(r/r_0)^m} &\text{if c=0,}\\ 1 &\text{if m=0,} \\ \frac{c}{c+(r/r_0)^m} &\text{else}, \end{cases}\end{split}\]

where \(c\) is the rate, \(r_0\) is the scale, \(m\) is the exponent.

Type: string

normalize

Whether to normalize so that the kernel between identical environments is 1. Default and highly recommended: True.

Type: boolean

radial_basis

Specifies the type of radial basis R_n to be computed (“GTO” for Gaussian typed orbitals and “DVR” discrete variable representation using Gauss-Legendre quadrature rule)

Type: string

soap_type

Specifies the type of representation to be computed.

Type: string

inversion_symmetry

Specifies whether inversion invariance should be enforced.

Type: boolean

covariant_lambda

Order of the lambda spectrum.

Type: int

cutoff_function_parameters

Additional parameters for the cutoff function. if cutoff_function_type == ‘RadialScaling’ then it should have the form

dict(rate=...,
     scale=...,
     exponent=...)

where ... should be replaced by the desired positive float.

Type: dict

optimization

Optional arguments for optimization of the computation of spherical expansion coefficients. “Spline” and “RadialDimReduction” are available.

Spline: Enables cubic splining for the radial basis functions.

accuracyfloat
accuracy of the cubic spline

RadialDimReduction: Projection matrices to optimize radial basis,

requires Spline to be set

projection_matricesdict: Contains or each species a list of projection matrices for each angular channel. A projection matrix for an angular channel has the shape (max_radial, expanded_max_radial). A number of expanded_max_radial radial basis are computed to be then projected to max_radial radial basis. The projected radial basis is then splined for each species and angular channel

Default settings is using spline

Type: dict, default None

transform(frames)[source]: Compute the representation for a list of ase.Atoms object.

lambda-soap: Grisafi, A., Wilkins, D. M., Csányi, G., & Ceriotti, M.

(2018). Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. Physical Review Letters, 120(3), 036002. https://doi.org/10.1103/PhysRevLett.120.036002

get_keys(species)[source]: return the proper list of keys used to build the representation

get_num_coefficients(n_species=1)[source]

Return the number of coefficients in the representation

(this is the descriptor size per atomic centre)

transform(frames)[source]

Compute the representation.

Parameters: frames (list(ase.Atoms) or AtomsList) – List of atomic structures.
Returns: AtomsList
Return type: Object containing the representation

update_hyperparameters(**hypers)[source]

Store the given dict of hyperparameters

Also updates the internal json-like representation

Models

Also included is an optimized implementation of kernel ridge regression (KRR, also equivalent to Gaussian approximation potentials aka GAP). Both fitting and evaluating of models is implemented; the evaluation in particular is optimized for use in MD simulations.

class rascal.models.Kernel(representation, name='Cosine', kernel_type='Full', target_type='Structure', **kwargs)[source]

Initialize the kernel with the given representation and parameters

Parameters

representation (Calculator) – Representation calculator associated with the kernel
name (string) – Type of kernel, ‘Cosine’ (aka dot-product) is the default and (currently) only option.
target_type (string) – Type of target (prediction) properties, must be either ‘Atom’ (the kernel is between atomic environments) or ‘Structure’ (the kernel is summed over atoms in a structure), which is the default
kernel_type (string) – Type of kernel method, either ‘Full’ (computing exact covariance matrix) or ‘Sparse’ (computing GAP 2 like kernel for sparse kernel methods like Subset of Regressors)

Notes

In the following we refer to the training samples with ‘N’ and, in the case of sparse kernels [1]_, we refer to the pseudo points with ‘M’. So a kernel between the training samples and the pseudo points is ‘KNM’. For more information on sparse kernels see rascal.models.krr.train_gap_model().

1: Joaquin Quiñonero-Candela, Carl Edward Rasmussen; A Unifying View of Sparse Approximate Gaussian Process Regression, 6(Dec):1939–1959, 2005. http://www.jmlr.org/papers/v6/quinonero-candela05a.html
2: Ceriotti, M., Willatt, M. J., & Csányi, G. (2018).

Machine Learning of Atomic-Scale Properties Based on Physical Principles. In Handbook of Materials Modeling (pp. 1–27). Springer, Cham. https://doi.org/10.1007/978-3-319-42913-7_68-1

class rascal.models.KRR(weights, kernel, X_train, self_contributions, description='KRR potential model', units=None)[source]

Kernel Ridge Regression model. Only supports sparse GPR training for the moment.

Parameters

weights (np.array) – weights of the model
kernel (Kernel) – kernel class used to train the model
X_train (SparsePoints) – reference samples used for the training
self_contributions (dictionary) – map atomic number to the property baseline, e.g. isolated atoms energies when the model has been trained on total energies.
description (string) – User-defined string used to describe the model for future reference
units (dict) – Energy and length units used by the model (default: eV and Å (aka AA), same as used in ASE)

predict(managers, KNM=None)[source]

Predict properties associated with the atomic structures in managers.

Parameters

managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel
KNM (np.array, optional) – precomputed sparse kernel matrix

Returns

predictions

Return type

np.array

predict_forces(managers, KNM=None)[source]

Predict negative gradients w.r.t atomic positions, e.g. forces, associated with the atomic structures in managers.

Parameters

managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel
KNM (np.array, optional) – precomputed sparse kernel matrix

Returns

predictions

Return type

np.array

predict_stress(managers, KNM=None)[source]

Predict gradients w.r.t cell parameters, e.g. stress, associated with the atomic structures in managers. The stress is returned using the Voigt order: xx, yy, zz, yz, xz, xy.

Parameters

managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel
KNM (np.array, optional) – precomputed sparse kernel matrix

Returns

predictions

Return type

np.array

rascal.models.compute_KNM(frames, X_sparse, kernel, soap)[source]

Compute GAP kernel of the (new) structures against the sparse points

Parameters

frames – New structures to compute kernel for
representation – RepresentationCalculator to use for the structures
X_sparse – Sparse points to compute kernels against
kernel – Kernel object to use

Returns

K_NM – Summed total-energy kernel stacked with the atom-position gradient of the kernel

Return type

np.array

Notes

This function can take quite a long time to run. To get a progress bar, you can wrap the frames parameter in a [tqdm] object like this:

from tqdm.notebook import tqdm # for Jupyter
#from tqdm import tqdm # on the command line
K_NM = compute_KNM(
    tqdm(frames, desc="compute KNM", leave=False),
    X_sparse,
    kernel,
    soap
)

tqdm: https://github.com/tqdm/tqdm

Filters

Filters are used mainly to select rows and columns from feature (or kernel) matrices in order to reduce their dimensionality and make the fitting problem tractable, or just more efficient. The following filters, implemented in scikit-cosmo (outbound), are available:

class rascal.utils.FPSFilter(representation, Nselect, act_on='sample per species', selector_args={}, **kwargs)[source]

get_fps_distances()[source]

Return the Hausdorff distances over the course of selection

This may be a useful (rough) indicator for choosing how many points to select, as a small distance generally indicates that the selected point is close to the existing set of selected points and therefore probably does not add much additional information.

Returns either an array of Hausdorff distances, or a species-indexed dict of arrays (for the “sample per species” mode).

class rascal.utils.CURFilter(representation, Nselect, act_on='sample per species', selector_args={}, **kwargs)[source]

Both inherit the interface of the following base class:

class rascal.utils.filter.Filter(representation, Nselect, selector, act_on='sample per species')[source]

A super class for filtering representations based upon a standard sample or feature selection class.

This is mainly a wrapper around selectors (implemented e.g. in scikit-cosmo) that handles the semantic-index transformations required after selection.

Parameters

representation (Calculator) – Representation calculator associated with the kernel
Nselect (int) – number of points to select. If act_on=’sample per species’ then it should be a dictionary mapping atom type to the number of samples, e.g. Nselect = {1:200,6:100,8:50}.
selector (selector to use for filtering. The selector should) – have a fit function, which when called will select from the input matrix the desired features / samples and a get_support function which takes parameters indices and ordered, and returns a list of selection indices, in the order that they were selected, when indices=True and ordered=True.
act_on (string) – Select how to apply the selection. Can be either of ‘sample’, ‘sample per species’,’feature’. Default ‘sample per species’. Note that for ‘feature’ mode only the SphericalInvariants representation is supported.

IO

Utilities for loading and saving Rascal objects (especially models)

rascal.utils.dump_obj(fn, instance, version='0.1')[source]

Save a python object that inherits from the BaseIO class

Parameters

fn (string) – path to save instance
instance (class) – python object that inherits from the BaseIO class
version (string, optional) – serialization version to use, by default CURRENT_VERSION

Raises

RuntimeError – When instance does not inherit from BaseIO

rascal.utils.load_obj(fn)[source]

Load a python object from a file

Parameters: fn (string) – path to the file describing the saved object
Returns
Return type: python class that inherits from BaseIO

class rascal.utils.BaseIO[source]

Interface of a Python class serializable by to_dict()

It corresponds to 3 methods:

_get_init_params is expected to return a dictionary containing all the

parameters used by the __init__() methods.

_get_data is expected to return a dictionary containing all the data

that is not set by the initialization of the class.

_set_data is expected to set the data that has been extracted by _get_data

The underlying c++ objects are not pickle-able so deepcopy does not work out of the box. This class provides an override of the __deepcopy__() function so that classes that inherit from this base class can be deepcopied.