Python

This list is incomplete. You can help by expanding it!

Atoms and cells

Some base functionality for working with structures and collections thereof in librascal

class rascal.neighbourlist.structure_manager.AtomsList(frames, nl_options, start=None, length=None, managers=None)[source]

A container for the neighbourlist and representation data associated with a list of atomic structures.

This is a wrapper class for the StructureManagerCollection that have between precompiled on the C++ side.

nl_options

Parameters for each layer of the wrapped structure manager. Parameters can be specified for these layers: center, neighbourlist and strict.

Type

dict

managers

C++ object from rascal that holds the neighbourlist and the data associated with representations.

Type

StructureManagerCollection

rascal.neighbourlist.structure_manager.mask_center_atoms_by_id(frame, id_select=None, id_blacklist=None)[source]

Mask the centers (center-select) of an ASE atoms object, by index

Parameters
  • frame (ase.Atoms) – Atomic structure to mask

  • id_select (list of int) – List of atom IDs to select

  • id_blacklist (list of int) – List of atom IDs to exclude

Returns

Return type

None (the Atoms object is modified directly)

Notes

The default is to select all atoms. If id_select is provided, select only those atoms. If only id_blacklist is provided, select all atoms except those in the blacklist. If both are provided, atoms are first selected based on id_select and then excluded based on id_blacklist. If the atoms object already has a mask, then id_select is applied first using the or operation, then id_blacklist is applied using the and not operation (so the order of precedence is: blacklist, selection, previous mask).

This logic allows this function to be combined with mask_center_atoms_by_species to allow mixed species/id-based masking.

rascal.neighbourlist.structure_manager.mask_center_atoms_by_species(frame, species_select=[], species_blacklist=[])[source]

Mask the centers of an ASE atoms object, by atomic species

Parameters
  • frame (ase.Atoms) – Atomic structure to mask

  • species_select (list of int or str) – List of atomic numbers, or species symbols, to select. Should be of consistent type across list.

  • species_blacklist (list of int or str) – List of atomic numbers, or species symbols, to exclude. Should be of consistent type across list.

Returns

Return type

None (the Atoms object is modified directly)

Notes

The default is to select all atoms. If species_select is provided, select only those atoms whose species is in the list. If only species_blacklist is provided, select all atoms except those whose species is in the blacklist. If both are provided, atoms are first selected based on species_select and then excluded based on species_blacklist. If the atoms object already has a mask, then species_select is applied first using the or operation, then species_blacklist is applied using the and not operation (so the order of precedence is: blacklist, selection, previous mask).

This logic allows this function to be combined with mask_center_atoms_by_id to allow mixed species/id-based masking.

Representations

Representations are the primary classes in librascal used to compute structural representations (features, descriptors) from a list of atoms.

class rascal.representations.SortedCoulombMatrix(cutoff, sorting_algorithm='row_norm', size=10, central_decay=- 1, interaction_cutoff=10, interaction_decay=- 1)[source]

Computes the Sorted Coulomb matrix representation [1].

cutoff
Type

float

central_decay

The distance over which the the coulomb interaction decays from full to none.

Type

float

interaction_cutoff

The distance between two non-central atom, where the coulomb interaction element will be zero.

Type

float

interaction_decay

The distance over which the the coulomb interaction decays from full to none.

Type

float

size

Larger or equal to the maximum number of neighbour an atom has in the structure.

Type

int

transform(frames)[source]

Compute the representation for a list of ase.Atoms object.

1

Rupp, M., Tkatchenko, A., Müller, K.-R., & von Lilienfeld, O. A. (2011). Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Physical Review Letters, 108(5), 58301. https://doi.org/10.1103/PhysRevLett.108.058301

transform(frames)[source]

Compute the representation.

Parameters

frames (list(ase.Atoms) or AtomsList) – List of atomic structures.

Returns

AtomsList

Return type

Object containing the representation

update_hyperparameters(**hypers)[source]

Store the given dict of hyperparameters

Also updates the internal json-like representation

class rascal.representations.SphericalInvariants(interaction_cutoff, cutoff_smooth_width, max_radial, max_angular, gaussian_sigma_type, gaussian_sigma_constant=0.3, cutoff_function_type='ShiftedCosine', soap_type='PowerSpectrum', inversion_symmetry=True, radial_basis='GTO', normalize=True, optimization=None, optimization_args=None, expansion_by_species_method='environment wise', global_species=None, compute_gradients=False, cutoff_function_parameters={}, coefficient_subselection=None)[source]

Computes a SphericalInvariants representation, i.e. the SOAP power spectrum.

interaction_cutoff

Maximum pairwise distance for atoms to be considered in expansion

Type

float

cutoff_smooth_width

The distance over which the the interaction is smoothed to zero

Type

float

max_radial

Number of radial basis functions

Type

int

max_angular

Highest angular momentum number (l) in the expansion

Type

int

gaussian_sigma_type

How the Gaussian atom sigmas (smearing widths) are allowed to vary. Only fixed smearing width (‘Constant’) are implemented.

Type

str

gaussian_sigma_constant

Specifies the atomic Gaussian widths, in the case where they’re fixed.

Type

float

cutoff_function_type

Choose the type of smooth cutoff function used to define the local environment. Can be either ‘ShiftedCosine’ or ‘RadialScaling’.

If ‘ShiftedCosine’, the functional form of the switching function is:

\[\begin{split}sc(r) = \begin{cases} 1 &r < r_c - sw,\\ 0.5 + 0.5 \cos(\pi * (r - r_c + sw) / sw) &r_c - sw < r <= r_c, \\ 0 &r_c < r, \end{cases}\end{split}\]

where \(r_c\) is the interaction_cutoff and \(sw\) is the cutoff_smooth_width.

If ‘RadialScaling’, the functional form of the switching function is as expressed in equation 21 of https://doi.org/10.1039/c8cp05921g:

\[rs(r) = sc(r) u(r),\]

where

\[\begin{split}u(r) = \begin{cases} \frac{1}{(r/r_0)^m} &\text{if c=0,}\\ 1 &\text{if m=0,} \\ \frac{c}{c+(r/r_0)^m} &\text{else}, \end{cases}\end{split}\]

where \(c\) is the rate, \(r_0\) is the scale, \(m\) is the exponent.

Type

string

soap_type

Specifies the type of representation to be computed (“RadialSpectrum”, “PowerSpectrum” and “BiSpectrum”).

Type

string

inversion_symmetry

Specifies whether inversion invariance should be enforced. (Only relevant for BiSpectrum.)

Type

boolean

radial_basis

Specifies the type of radial basis R_n to be computed (“GTO” for Gaussian typed orbitals and “DVR” discrete variable representation using Gaussian-Legendre quadrature rule)

Type

string

normalize

Whether to normalize so that the kernel between identical environments is 1. Default and highly recommended: True.

Type

boolean

optimization

Optional arguments for optimization of the computation of spherical expansion coefficients. “Spline” and “RadialDimReduction” are available.

Spline: Enables cubic splining for the radial basis functions.

accuracyfloat

accuracy of the cubic spline

RadialDimReduction: Projection matrices to optimize radial basis,

requires Spline to be set

projection_matricesdict

Contains or each species a list of projection matrices for each angular channel. A projection matrix for an angular channel has the shape (max_radial, expanded_max_radial). A number of expanded_max_radial radial basis are computed to be then projected to max_radial radial basis. The projected radial basis is then splined for each species and angular channel

Default settings is using spline

Type

dict, default None

expansion_by_species_method

Specifies the how the species key of the invariant are set-up. Possible values: ‘environment wise’, ‘user defined’, ‘structure wise’. The descriptor is computed for each atomic enviroment and it is indexed using tuples of atomic species that are present within the environment. This index is by definition sparse since a species tuple will be non zero only if the atomic species are present inside the environment. ‘environment wise’ means that each environmental representation will only contain the minimal set of species tuples needed by each atomic environment. ‘structure wise’ means that within a structure the species tuples will be the same for each environment coefficients. ‘user defined’ uses global_species to set-up the species tuples.

These different settings correspond to different trade-off between the memory efficiency of the invariants and the computational efficiency of the kernel computation. When computing a kernel using ‘environment wise’ setting does not allow for efficent matrix matrix multiplications which is ensured when ‘user defined’ is used. ‘structure wise’ is a balance between the memory footprint and the use of matrix matrix products.

Note that the sparsity of the gradient coefficients and their use to build kernels does not allow for clear efficiency gains so their sparsity is kept irrespective of expansion_by_species_method.

Type

string

global_species

list of species (specified with their atomic number) to use to set-up the species key of the invariant. It should contain all the species present in the structure for which invariants will be computed

Type

list of int

compute_gradients

control the computation of the representation’s gradients w.r.t. atomic positions.

Type

bool

cutoff_function_parameters

Additional parameters for the cutoff function. if cutoff_function_type == ‘RadialScaling’ then it should have the form

dict(rate=...,
     scale=...,
     exponent=...)

where ... should be replaced by the desired positive float.

Type

dict

coefficient_subselection

if None then all the coefficients are computed following max_radial, max_angular and the atomic species present. if soap_type == 'PowerSpectrum' and it has the form {'a': [...], 'b': [...], 'n1': [...], 'n2': [...], 'l': [...]} where ‘a’ and ‘b’ are lists of atomic species, ‘n1’ and ‘n2’ are lists of radial expension indices and ‘l’ is the list of spherical expansion index. Each of these lists have the same size and their ith element refers to one PowerSpectrum coefficient that will be computed. utils.FPSFilter and utils.CURFilter with act_on set to feature output such dictionary.

Type

list or None

transform(frames)[source]

Compute the representation for a list of ase.Atoms object.

soap

Bartók, Kondor, and Csányi, “On representing chemical environments”, Phys. Rev. B. 87(18), p. 184115 http://link.aps.org/doi/10.1103/PhysRevB.87.184115

get_keys(species)[source]

return the proper list of keys used to build the representation

get_num_coefficients(n_species=1)[source]

Return the number of coefficients in the spherical invariants

(this is the descriptor size per atomic centre)

transform(frames)[source]

Compute the representation.

Parameters

frames (list(ase.Atoms) or AtomsList) – List of atomic structures.

Returns

AtomsList

Return type

Object containing the representation

update_hyperparameters(**hypers)[source]

Store the given dict of hyperparameters

Also updates the internal json-like representation

class rascal.representations.SphericalCovariants(interaction_cutoff, cutoff_smooth_width, max_radial, max_angular, gaussian_sigma_type, gaussian_sigma_constant=0.3, cutoff_function_type='ShiftedCosine', normalize=True, radial_basis='GTO', optimization=None, optimization_args=None, soap_type='LambdaSpectrum', inversion_symmetry=True, covariant_lambda=0, cutoff_function_parameters={})[source]

Computes a SphericalCovariants representation, i.e. lambda spectrum.

interaction_cutoff

Maximum pairwise distance for atoms to be considered in expansion

Type

float

cutoff_smooth_width

The distance over which the the interaction is smoothed to zero

Type

float

max_radial

Number of radial basis functions

Type

int

max_angular

Highest angular momentum number (l) in the expansion

Type

int

gaussian_sigma_type

How the Gaussian atom sigmas (smearing widths) are allowed to vary. Only fixed smearing width (‘Constant’) are implemented.

Type

str

gaussian_sigma_constant

Specifies the atomic Gaussian widths, in the case where they’re fixed.

Type

float

cutoff_function_type

Choose the type of smooth cutoff function used to define the local environment. Can be either ‘ShiftedCosine’ or ‘RadialScaling’.

If ‘ShiftedCosine’, the functional form of the switching function is:

\[\begin{split}sc(r) = \begin{cases} 1 &r < r_c - sw,\\ 0.5 + 0.5 \cos(\pi * (r - r_c + sw) / sw) &r_c - sw < r <= r_c, \\ 0 &r_c < r, \end{cases}\end{split}\]

where \(r_c\) is the interaction_cutoff and \(sw\) is the cutoff_smooth_width.

If ‘RadialScaling’, the functional form of the switching function is as expressed in equation 21 of https://doi.org/10.1039/c8cp05921g:

\[rs(r) = sc(r) u(r),\]

where

\[\begin{split}u(r) = \begin{cases} \frac{1}{(r/r_0)^m} &\text{if c=0,}\\ 1 &\text{if m=0,} \\ \frac{c}{c+(r/r_0)^m} &\text{else}, \end{cases}\end{split}\]

where \(c\) is the rate, \(r_0\) is the scale, \(m\) is the exponent.

Type

string

normalize

Whether to normalize so that the kernel between identical environments is 1. Default and highly recommended: True.

Type

boolean

radial_basis

Specifies the type of radial basis R_n to be computed (“GTO” for Gaussian typed orbitals and “DVR” discrete variable representation using Gauss-Legendre quadrature rule)

Type

string

soap_type

Specifies the type of representation to be computed.

Type

string

inversion_symmetry

Specifies whether inversion invariance should be enforced.

Type

boolean

covariant_lambda

Order of the lambda spectrum.

Type

int

cutoff_function_parameters

Additional parameters for the cutoff function. if cutoff_function_type == ‘RadialScaling’ then it should have the form

dict(rate=...,
     scale=...,
     exponent=...)

where ... should be replaced by the desired positive float.

Type

dict

optimization

Optional arguments for optimization of the computation of spherical expansion coefficients. “Spline” and “RadialDimReduction” are available.

Spline: Enables cubic splining for the radial basis functions.

accuracyfloat

accuracy of the cubic spline

RadialDimReduction: Projection matrices to optimize radial basis,

requires Spline to be set

projection_matricesdict

Contains or each species a list of projection matrices for each angular channel. A projection matrix for an angular channel has the shape (max_radial, expanded_max_radial). A number of expanded_max_radial radial basis are computed to be then projected to max_radial radial basis. The projected radial basis is then splined for each species and angular channel

Default settings is using spline

Type

dict, default None

transform(frames)[source]

Compute the representation for a list of ase.Atoms object.

lambda-soap

Grisafi, A., Wilkins, D. M., Csányi, G., & Ceriotti, M.

(2018). Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. Physical Review Letters, 120(3), 036002. https://doi.org/10.1103/PhysRevLett.120.036002

get_keys(species)[source]

return the proper list of keys used to build the representation

get_num_coefficients(n_species=1)[source]

Return the number of coefficients in the representation

(this is the descriptor size per atomic centre)

transform(frames)[source]

Compute the representation.

Parameters

frames (list(ase.Atoms) or AtomsList) – List of atomic structures.

Returns

AtomsList

Return type

Object containing the representation

update_hyperparameters(**hypers)[source]

Store the given dict of hyperparameters

Also updates the internal json-like representation

Models

Also included is an optimized implementation of kernel ridge regression (KRR, also equivalent to Gaussian approximation potentials aka GAP). Both fitting and evaluating of models is implemented; the evaluation in particular is optimized for use in MD simulations.

class rascal.models.Kernel(representation, name='Cosine', kernel_type='Full', target_type='Structure', **kwargs)[source]

Initialize the kernel with the given representation and parameters

Parameters
  • representation (Calculator) – Representation calculator associated with the kernel

  • name (string) – Type of kernel, ‘Cosine’ (aka dot-product) is the default and (currently) only option.

  • target_type (string) – Type of target (prediction) properties, must be either ‘Atom’ (the kernel is between atomic environments) or ‘Structure’ (the kernel is summed over atoms in a structure), which is the default

  • kernel_type (string) – Type of kernel method, either ‘Full’ (computing exact covariance matrix) or ‘Sparse’ (computing GAP 2 like kernel for sparse kernel methods like Subset of Regressors)

Notes

In the following we refer to the training samples with ‘N’ and, in the case of sparse kernels [1]_, we refer to the pseudo points with ‘M’. So a kernel between the training samples and the pseudo points is ‘KNM’. For more information on sparse kernels see rascal.models.krr.train_gap_model().

1

Joaquin Quiñonero-Candela, Carl Edward Rasmussen; A Unifying View of Sparse Approximate Gaussian Process Regression, 6(Dec):1939–1959, 2005. http://www.jmlr.org/papers/v6/quinonero-candela05a.html

2

Ceriotti, M., Willatt, M. J., & Csányi, G. (2018).

Machine Learning of Atomic-Scale Properties Based on Physical Principles. In Handbook of Materials Modeling (pp. 1–27). Springer, Cham. https://doi.org/10.1007/978-3-319-42913-7_68-1

class rascal.models.KRR(weights, kernel, X_train, self_contributions, description='KRR potential model', units=None)[source]

Kernel Ridge Regression model. Only supports sparse GPR training for the moment.

Parameters
  • weights (np.array) – weights of the model

  • kernel (Kernel) – kernel class used to train the model

  • X_train (SparsePoints) – reference samples used for the training

  • self_contributions (dictionary) – map atomic number to the property baseline, e.g. isolated atoms energies when the model has been trained on total energies.

  • description (string) – User-defined string used to describe the model for future reference

  • units (dict) – Energy and length units used by the model (default: eV and Å (aka AA), same as used in ASE)

predict(managers, KNM=None)[source]

Predict properties associated with the atomic structures in managers.

Parameters
  • managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel

  • KNM (np.array, optional) – precomputed sparse kernel matrix

Returns

predictions

Return type

np.array

predict_forces(managers, KNM=None)[source]

Predict negative gradients w.r.t atomic positions, e.g. forces, associated with the atomic structures in managers.

Parameters
  • managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel

  • KNM (np.array, optional) – precomputed sparse kernel matrix

Returns

predictions

Return type

np.array

predict_stress(managers, KNM=None)[source]

Predict gradients w.r.t cell parameters, e.g. stress, associated with the atomic structures in managers. The stress is returned using the Voigt order: xx, yy, zz, yz, xz, xy.

Parameters
  • managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel

  • KNM (np.array, optional) – precomputed sparse kernel matrix

Returns

predictions

Return type

np.array

rascal.models.compute_KNM(frames, X_sparse, kernel, soap)[source]

Compute GAP kernel of the (new) structures against the sparse points

Parameters
  • frames – New structures to compute kernel for

  • representation – RepresentationCalculator to use for the structures

  • X_sparse – Sparse points to compute kernels against

  • kernel – Kernel object to use

Returns

K_NM – Summed total-energy kernel stacked with the atom-position gradient of the kernel

Return type

np.array

Notes

This function can take quite a long time to run. To get a progress bar, you can wrap the frames parameter in a [tqdm] object like this:

from tqdm.notebook import tqdm # for Jupyter
#from tqdm import tqdm # on the command line
K_NM = compute_KNM(
    tqdm(frames, desc="compute KNM", leave=False),
    X_sparse,
    kernel,
    soap
)
tqdm

https://github.com/tqdm/tqdm

Filters

Filters are used mainly to select rows and columns from feature (or kernel) matrices in order to reduce their dimensionality and make the fitting problem tractable, or just more efficient. The following filters, implemented in scikit-cosmo (outbound), are available:

class rascal.utils.FPSFilter(representation, Nselect, act_on='sample per species', selector_args={}, **kwargs)[source]
get_fps_distances()[source]

Return the Hausdorff distances over the course of selection

This may be a useful (rough) indicator for choosing how many points to select, as a small distance generally indicates that the selected point is close to the existing set of selected points and therefore probably does not add much additional information.

Returns either an array of Hausdorff distances, or a species-indexed dict of arrays (for the “sample per species” mode).

class rascal.utils.CURFilter(representation, Nselect, act_on='sample per species', selector_args={}, **kwargs)[source]

Both inherit the interface of the following base class:

class rascal.utils.filter.Filter(representation, Nselect, selector, act_on='sample per species')[source]

A super class for filtering representations based upon a standard sample or feature selection class.

This is mainly a wrapper around selectors (implemented e.g. in scikit-cosmo) that handles the semantic-index transformations required after selection.

Parameters
  • representation (Calculator) – Representation calculator associated with the kernel

  • Nselect (int) – number of points to select. If act_on=’sample per species’ then it should be a dictionary mapping atom type to the number of samples, e.g. Nselect = {1:200,6:100,8:50}.

  • selector (selector to use for filtering. The selector should) – have a fit function, which when called will select from the input matrix the desired features / samples and a get_support function which takes parameters indices and ordered, and returns a list of selection indices, in the order that they were selected, when indices=True and ordered=True.

  • act_on (string) – Select how to apply the selection. Can be either of ‘sample’, ‘sample per species’,’feature’. Default ‘sample per species’. Note that for ‘feature’ mode only the SphericalInvariants representation is supported.

IO

Utilities for loading and saving Rascal objects (especially models)

rascal.utils.dump_obj(fn, instance, version='0.1')[source]

Save a python object that inherits from the BaseIO class

Parameters
  • fn (string) – path to save instance

  • instance (class) – python object that inherits from the BaseIO class

  • version (string, optional) – serialization version to use, by default CURRENT_VERSION

Raises

RuntimeError – When instance does not inherit from BaseIO

rascal.utils.load_obj(fn)[source]

Load a python object from a file

Parameters

fn (string) – path to the file describing the saved object

Returns

Return type

python class that inherits from BaseIO

class rascal.utils.BaseIO[source]

Interface of a Python class serializable by to_dict()

It corresponds to 3 methods:

  • _get_init_params is expected to return a dictionary containing all the

parameters used by the __init__() methods.

  • _get_data is expected to return a dictionary containing all the data

that is not set by the initialization of the class.

  • _set_data is expected to set the data that has been extracted by _get_data

The underlying c++ objects are not pickle-able so deepcopy does not work out of the box. This class provides an override of the __deepcopy__() function so that classes that inherit from this base class can be deepcopied.