Python
This list is incomplete. You can help by expanding it!
Atoms and cells
Some base functionality for working with structures and collections thereof in
librascal
- class rascal.neighbourlist.structure_manager.AtomsList(frames, nl_options, start=None, length=None, managers=None)[source]
A container for the neighbourlist and representation data associated with a list of atomic structures.
This is a wrapper class for the StructureManagerCollection that have between precompiled on the C++ side.
- nl_options
Parameters for each layer of the wrapped structure manager. Parameters can be specified for these layers: center, neighbourlist and strict.
- Type
dict
- managers
C++ object from rascal that holds the neighbourlist and the data associated with representations.
- Type
StructureManagerCollection
- rascal.neighbourlist.structure_manager.mask_center_atoms_by_id(frame, id_select=None, id_blacklist=None)[source]
Mask the centers (center-select) of an ASE atoms object, by index
- Parameters
frame (ase.Atoms) – Atomic structure to mask
id_select (list of int) – List of atom IDs to select
id_blacklist (list of int) – List of atom IDs to exclude
- Returns
- Return type
None (the Atoms object is modified directly)
Notes
The default is to select all atoms. If id_select is provided, select only those atoms. If only id_blacklist is provided, select all atoms except those in the blacklist. If both are provided, atoms are first selected based on id_select and then excluded based on id_blacklist. If the atoms object already has a mask, then id_select is applied first using the or operation, then id_blacklist is applied using the and not operation (so the order of precedence is: blacklist, selection, previous mask).
This logic allows this function to be combined with mask_center_atoms_by_species to allow mixed species/id-based masking.
- rascal.neighbourlist.structure_manager.mask_center_atoms_by_species(frame, species_select=[], species_blacklist=[])[source]
Mask the centers of an ASE atoms object, by atomic species
- Parameters
frame (ase.Atoms) – Atomic structure to mask
species_select (list of int or str) – List of atomic numbers, or species symbols, to select. Should be of consistent type across list.
species_blacklist (list of int or str) – List of atomic numbers, or species symbols, to exclude. Should be of consistent type across list.
- Returns
- Return type
None (the Atoms object is modified directly)
Notes
The default is to select all atoms. If species_select is provided, select only those atoms whose species is in the list. If only species_blacklist is provided, select all atoms except those whose species is in the blacklist. If both are provided, atoms are first selected based on species_select and then excluded based on species_blacklist. If the atoms object already has a mask, then species_select is applied first using the or operation, then species_blacklist is applied using the and not operation (so the order of precedence is: blacklist, selection, previous mask).
This logic allows this function to be combined with mask_center_atoms_by_id to allow mixed species/id-based masking.
Representations
Representations are the primary classes in librascal
used to compute
structural representations (features, descriptors) from a list of atoms.
- class rascal.representations.SortedCoulombMatrix(cutoff, sorting_algorithm='row_norm', size=10, central_decay=- 1, interaction_cutoff=10, interaction_decay=- 1)[source]
Computes the Sorted Coulomb matrix representation [1].
- cutoff
- Type
float
- central_decay
The distance over which the the coulomb interaction decays from full to none.
- Type
float
- interaction_cutoff
The distance between two non-central atom, where the coulomb interaction element will be zero.
- Type
float
- interaction_decay
The distance over which the the coulomb interaction decays from full to none.
- Type
float
- size
Larger or equal to the maximum number of neighbour an atom has in the structure.
- Type
int
- 1
Rupp, M., Tkatchenko, A., Müller, K.-R., & von Lilienfeld, O. A. (2011). Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Physical Review Letters, 108(5), 58301. https://doi.org/10.1103/PhysRevLett.108.058301
- class rascal.representations.SphericalInvariants(interaction_cutoff, cutoff_smooth_width, max_radial, max_angular, gaussian_sigma_type, gaussian_sigma_constant=0.3, cutoff_function_type='ShiftedCosine', soap_type='PowerSpectrum', inversion_symmetry=True, radial_basis='GTO', normalize=True, optimization=None, optimization_args=None, expansion_by_species_method='environment wise', global_species=None, compute_gradients=False, cutoff_function_parameters={}, coefficient_subselection=None)[source]
Computes a SphericalInvariants representation, i.e. the SOAP power spectrum.
- interaction_cutoff
Maximum pairwise distance for atoms to be considered in expansion
- Type
float
- cutoff_smooth_width
The distance over which the the interaction is smoothed to zero
- Type
float
- max_radial
Number of radial basis functions
- Type
int
- max_angular
Highest angular momentum number (l) in the expansion
- Type
int
- gaussian_sigma_type
How the Gaussian atom sigmas (smearing widths) are allowed to vary. Only fixed smearing width (‘Constant’) are implemented.
- Type
str
- gaussian_sigma_constant
Specifies the atomic Gaussian widths, in the case where they’re fixed.
- Type
float
- cutoff_function_type
Choose the type of smooth cutoff function used to define the local environment. Can be either ‘ShiftedCosine’ or ‘RadialScaling’.
If ‘ShiftedCosine’, the functional form of the switching function is:
\[\begin{split}sc(r) = \begin{cases} 1 &r < r_c - sw,\\ 0.5 + 0.5 \cos(\pi * (r - r_c + sw) / sw) &r_c - sw < r <= r_c, \\ 0 &r_c < r, \end{cases}\end{split}\]where \(r_c\) is the interaction_cutoff and \(sw\) is the cutoff_smooth_width.
If ‘RadialScaling’, the functional form of the switching function is as expressed in equation 21 of https://doi.org/10.1039/c8cp05921g:
\[rs(r) = sc(r) u(r),\]where
\[\begin{split}u(r) = \begin{cases} \frac{1}{(r/r_0)^m} &\text{if c=0,}\\ 1 &\text{if m=0,} \\ \frac{c}{c+(r/r_0)^m} &\text{else}, \end{cases}\end{split}\]where \(c\) is the rate, \(r_0\) is the scale, \(m\) is the exponent.
- Type
string
- soap_type
Specifies the type of representation to be computed (“RadialSpectrum”, “PowerSpectrum” and “BiSpectrum”).
- Type
string
- inversion_symmetry
Specifies whether inversion invariance should be enforced. (Only relevant for BiSpectrum.)
- Type
boolean
- radial_basis
Specifies the type of radial basis R_n to be computed (“GTO” for Gaussian typed orbitals and “DVR” discrete variable representation using Gaussian-Legendre quadrature rule)
- Type
string
- normalize
Whether to normalize so that the kernel between identical environments is 1. Default and highly recommended: True.
- Type
boolean
- optimization
Optional arguments for optimization of the computation of spherical expansion coefficients. “Spline” and “RadialDimReduction” are available.
Spline: Enables cubic splining for the radial basis functions.
- accuracyfloat
accuracy of the cubic spline
- RadialDimReduction: Projection matrices to optimize radial basis,
requires Spline to be set
- projection_matricesdict
Contains or each species a list of projection matrices for each angular channel. A projection matrix for an angular channel has the shape (max_radial, expanded_max_radial). A number of expanded_max_radial radial basis are computed to be then projected to max_radial radial basis. The projected radial basis is then splined for each species and angular channel
Default settings is using spline
- Type
dict, default None
- expansion_by_species_method
Specifies the how the species key of the invariant are set-up. Possible values: ‘environment wise’, ‘user defined’, ‘structure wise’. The descriptor is computed for each atomic enviroment and it is indexed using tuples of atomic species that are present within the environment. This index is by definition sparse since a species tuple will be non zero only if the atomic species are present inside the environment. ‘environment wise’ means that each environmental representation will only contain the minimal set of species tuples needed by each atomic environment. ‘structure wise’ means that within a structure the species tuples will be the same for each environment coefficients. ‘user defined’ uses global_species to set-up the species tuples.
These different settings correspond to different trade-off between the memory efficiency of the invariants and the computational efficiency of the kernel computation. When computing a kernel using ‘environment wise’ setting does not allow for efficent matrix matrix multiplications which is ensured when ‘user defined’ is used. ‘structure wise’ is a balance between the memory footprint and the use of matrix matrix products.
Note that the sparsity of the gradient coefficients and their use to build kernels does not allow for clear efficiency gains so their sparsity is kept irrespective of expansion_by_species_method.
- Type
string
- global_species
list of species (specified with their atomic number) to use to set-up the species key of the invariant. It should contain all the species present in the structure for which invariants will be computed
- Type
list of int
- compute_gradients
control the computation of the representation’s gradients w.r.t. atomic positions.
- Type
bool
- cutoff_function_parameters
Additional parameters for the cutoff function. if cutoff_function_type == ‘RadialScaling’ then it should have the form
dict(rate=..., scale=..., exponent=...)
where
...
should be replaced by the desired positive float.- Type
dict
- coefficient_subselection
if None then all the coefficients are computed following max_radial, max_angular and the atomic species present. if
soap_type == 'PowerSpectrum'
and it has the form{'a': [...], 'b': [...], 'n1': [...], 'n2': [...], 'l': [...]}
where ‘a’ and ‘b’ are lists of atomic species, ‘n1’ and ‘n2’ are lists of radial expension indices and ‘l’ is the list of spherical expansion index. Each of these lists have the same size and their ith element refers to one PowerSpectrum coefficient that will be computed.utils.FPSFilter
andutils.CURFilter
with act_on set to feature output such dictionary.- Type
list or None
- soap
Bartók, Kondor, and Csányi, “On representing chemical environments”, Phys. Rev. B. 87(18), p. 184115 http://link.aps.org/doi/10.1103/PhysRevB.87.184115
- get_num_coefficients(n_species=1)[source]
Return the number of coefficients in the spherical invariants
(this is the descriptor size per atomic centre)
- class rascal.representations.SphericalCovariants(interaction_cutoff, cutoff_smooth_width, max_radial, max_angular, gaussian_sigma_type, gaussian_sigma_constant=0.3, cutoff_function_type='ShiftedCosine', normalize=True, radial_basis='GTO', optimization=None, optimization_args=None, soap_type='LambdaSpectrum', inversion_symmetry=True, covariant_lambda=0, cutoff_function_parameters={})[source]
Computes a SphericalCovariants representation, i.e. lambda spectrum.
- interaction_cutoff
Maximum pairwise distance for atoms to be considered in expansion
- Type
float
- cutoff_smooth_width
The distance over which the the interaction is smoothed to zero
- Type
float
- max_radial
Number of radial basis functions
- Type
int
- max_angular
Highest angular momentum number (l) in the expansion
- Type
int
- gaussian_sigma_type
How the Gaussian atom sigmas (smearing widths) are allowed to vary. Only fixed smearing width (‘Constant’) are implemented.
- Type
str
- gaussian_sigma_constant
Specifies the atomic Gaussian widths, in the case where they’re fixed.
- Type
float
- cutoff_function_type
Choose the type of smooth cutoff function used to define the local environment. Can be either ‘ShiftedCosine’ or ‘RadialScaling’.
If ‘ShiftedCosine’, the functional form of the switching function is:
\[\begin{split}sc(r) = \begin{cases} 1 &r < r_c - sw,\\ 0.5 + 0.5 \cos(\pi * (r - r_c + sw) / sw) &r_c - sw < r <= r_c, \\ 0 &r_c < r, \end{cases}\end{split}\]where \(r_c\) is the interaction_cutoff and \(sw\) is the cutoff_smooth_width.
If ‘RadialScaling’, the functional form of the switching function is as expressed in equation 21 of https://doi.org/10.1039/c8cp05921g:
\[rs(r) = sc(r) u(r),\]where
\[\begin{split}u(r) = \begin{cases} \frac{1}{(r/r_0)^m} &\text{if c=0,}\\ 1 &\text{if m=0,} \\ \frac{c}{c+(r/r_0)^m} &\text{else}, \end{cases}\end{split}\]where \(c\) is the rate, \(r_0\) is the scale, \(m\) is the exponent.
- Type
string
- normalize
Whether to normalize so that the kernel between identical environments is 1. Default and highly recommended: True.
- Type
boolean
- radial_basis
Specifies the type of radial basis R_n to be computed (“GTO” for Gaussian typed orbitals and “DVR” discrete variable representation using Gauss-Legendre quadrature rule)
- Type
string
- soap_type
Specifies the type of representation to be computed.
- Type
string
- inversion_symmetry
Specifies whether inversion invariance should be enforced.
- Type
boolean
- covariant_lambda
Order of the lambda spectrum.
- Type
int
- cutoff_function_parameters
Additional parameters for the cutoff function. if cutoff_function_type == ‘RadialScaling’ then it should have the form
dict(rate=..., scale=..., exponent=...)
where
...
should be replaced by the desired positive float.- Type
dict
- optimization
Optional arguments for optimization of the computation of spherical expansion coefficients. “Spline” and “RadialDimReduction” are available.
Spline: Enables cubic splining for the radial basis functions.
- accuracyfloat
accuracy of the cubic spline
- RadialDimReduction: Projection matrices to optimize radial basis,
requires Spline to be set
- projection_matricesdict
Contains or each species a list of projection matrices for each angular channel. A projection matrix for an angular channel has the shape (max_radial, expanded_max_radial). A number of expanded_max_radial radial basis are computed to be then projected to max_radial radial basis. The projected radial basis is then splined for each species and angular channel
Default settings is using spline
- Type
dict, default None
- lambda-soap
Grisafi, A., Wilkins, D. M., Csányi, G., & Ceriotti, M.
(2018). Symmetry-Adapted Machine Learning for Tensorial Properties of Atomistic Systems. Physical Review Letters, 120(3), 036002. https://doi.org/10.1103/PhysRevLett.120.036002
- get_num_coefficients(n_species=1)[source]
Return the number of coefficients in the representation
(this is the descriptor size per atomic centre)
Models
Also included is an optimized implementation of kernel ridge regression (KRR, also equivalent to Gaussian approximation potentials aka GAP). Both fitting and evaluating of models is implemented; the evaluation in particular is optimized for use in MD simulations.
- class rascal.models.Kernel(representation, name='Cosine', kernel_type='Full', target_type='Structure', **kwargs)[source]
Initialize the kernel with the given representation and parameters
- Parameters
representation (Calculator) – Representation calculator associated with the kernel
name (string) – Type of kernel, ‘Cosine’ (aka dot-product) is the default and (currently) only option.
target_type (string) – Type of target (prediction) properties, must be either ‘Atom’ (the kernel is between atomic environments) or ‘Structure’ (the kernel is summed over atoms in a structure), which is the default
kernel_type (string) – Type of kernel method, either ‘Full’ (computing exact covariance matrix) or ‘Sparse’ (computing GAP 2 like kernel for sparse kernel methods like Subset of Regressors)
Notes
In the following we refer to the training samples with ‘N’ and, in the case of sparse kernels [1]_, we refer to the pseudo points with ‘M’. So a kernel between the training samples and the pseudo points is ‘KNM’. For more information on sparse kernels see
rascal.models.krr.train_gap_model()
.- 1
Joaquin Quiñonero-Candela, Carl Edward Rasmussen; A Unifying View of Sparse Approximate Gaussian Process Regression, 6(Dec):1939–1959, 2005. http://www.jmlr.org/papers/v6/quinonero-candela05a.html
- 2
Ceriotti, M., Willatt, M. J., & Csányi, G. (2018).
Machine Learning of Atomic-Scale Properties Based on Physical Principles. In Handbook of Materials Modeling (pp. 1–27). Springer, Cham. https://doi.org/10.1007/978-3-319-42913-7_68-1
- class rascal.models.KRR(weights, kernel, X_train, self_contributions, description='KRR potential model', units=None)[source]
Kernel Ridge Regression model. Only supports sparse GPR training for the moment.
- Parameters
weights (np.array) – weights of the model
kernel (Kernel) – kernel class used to train the model
X_train (SparsePoints) – reference samples used for the training
self_contributions (dictionary) – map atomic number to the property baseline, e.g. isolated atoms energies when the model has been trained on total energies.
description (string) – User-defined string used to describe the model for future reference
units (dict) – Energy and length units used by the model (default: eV and Å (aka AA), same as used in ASE)
- predict(managers, KNM=None)[source]
Predict properties associated with the atomic structures in managers.
- Parameters
managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel
KNM (np.array, optional) – precomputed sparse kernel matrix
- Returns
predictions
- Return type
np.array
- predict_forces(managers, KNM=None)[source]
Predict negative gradients w.r.t atomic positions, e.g. forces, associated with the atomic structures in managers.
- Parameters
managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel
KNM (np.array, optional) – precomputed sparse kernel matrix
- Returns
predictions
- Return type
np.array
- predict_stress(managers, KNM=None)[source]
Predict gradients w.r.t cell parameters, e.g. stress, associated with the atomic structures in managers. The stress is returned using the Voigt order: xx, yy, zz, yz, xz, xy.
- Parameters
managers (AtomsList) – list of atomic structures with already computed features compatible with representation in kernel
KNM (np.array, optional) – precomputed sparse kernel matrix
- Returns
predictions
- Return type
np.array
- rascal.models.compute_KNM(frames, X_sparse, kernel, soap)[source]
Compute GAP kernel of the (new) structures against the sparse points
- Parameters
frames – New structures to compute kernel for
representation – RepresentationCalculator to use for the structures
X_sparse – Sparse points to compute kernels against
kernel – Kernel object to use
- Returns
K_NM – Summed total-energy kernel stacked with the atom-position gradient of the kernel
- Return type
np.array
Notes
This function can take quite a long time to run. To get a progress bar, you can wrap the frames parameter in a [tqdm] object like this:
from tqdm.notebook import tqdm # for Jupyter #from tqdm import tqdm # on the command line K_NM = compute_KNM( tqdm(frames, desc="compute KNM", leave=False), X_sparse, kernel, soap )
Filters
Filters are used mainly to select rows and columns from feature (or kernel) matrices in order to reduce their dimensionality and make the fitting problem tractable, or just more efficient. The following filters, implemented in scikit-cosmo (outbound), are available:
- class rascal.utils.FPSFilter(representation, Nselect, act_on='sample per species', selector_args={}, **kwargs)[source]
- get_fps_distances()[source]
Return the Hausdorff distances over the course of selection
This may be a useful (rough) indicator for choosing how many points to select, as a small distance generally indicates that the selected point is close to the existing set of selected points and therefore probably does not add much additional information.
Returns either an array of Hausdorff distances, or a species-indexed dict of arrays (for the “sample per species” mode).
- class rascal.utils.CURFilter(representation, Nselect, act_on='sample per species', selector_args={}, **kwargs)[source]
Both inherit the interface of the following base class:
- class rascal.utils.filter.Filter(representation, Nselect, selector, act_on='sample per species')[source]
A super class for filtering representations based upon a standard sample or feature selection class.
This is mainly a wrapper around selectors (implemented e.g. in scikit-cosmo) that handles the semantic-index transformations required after selection.
- Parameters
representation (Calculator) – Representation calculator associated with the kernel
Nselect (int) – number of points to select. If act_on=’sample per species’ then it should be a dictionary mapping atom type to the number of samples, e.g. Nselect = {1:200,6:100,8:50}.
selector (selector to use for filtering. The selector should) – have a fit function, which when called will select from the input matrix the desired features / samples and a get_support function which takes parameters indices and ordered, and returns a list of selection indices, in the order that they were selected, when indices=True and ordered=True.
act_on (string) – Select how to apply the selection. Can be either of ‘sample’, ‘sample per species’,’feature’. Default ‘sample per species’. Note that for ‘feature’ mode only the SphericalInvariants representation is supported.
IO
Utilities for loading and saving Rascal objects (especially models)
- rascal.utils.dump_obj(fn, instance, version='0.1')[source]
Save a python object that inherits from the BaseIO class
- Parameters
fn (string) – path to save instance
instance (class) – python object that inherits from the BaseIO class
version (string, optional) – serialization version to use, by default CURRENT_VERSION
- Raises
RuntimeError – When instance does not inherit from BaseIO
- rascal.utils.load_obj(fn)[source]
Load a python object from a file
- Parameters
fn (string) – path to the file describing the saved object
- Returns
- Return type
python class that inherits from BaseIO
- class rascal.utils.BaseIO[source]
Interface of a Python class serializable by to_dict()
It corresponds to 3 methods:
_get_init_params is expected to return a dictionary containing all the
parameters used by the __init__() methods.
_get_data is expected to return a dictionary containing all the data
that is not set by the initialization of the class.
_set_data is expected to set the data that has been extracted by _get_data
The underlying c++ objects are not pickle-able so deepcopy does not work out of the box. This class provides an override of the __deepcopy__() function so that classes that inherit from this base class can be deepcopied.