Utilities

utilities.get_all_species(structures)

getting all unique atomic species among the structures

Parameters

structures – list of ase atoms objects

Returns

sorted numpy array with ints with all unique species in the format where 1 states for H, 2 for He and so on. (inherits from ase function atoms_object.get_atomic_numbers())

utilities.get_compositional_features(structures, all_species)

getting compositional features suitable for linear regression which contains information about the number of atoms with particular species in the structure

Parameters
  • structures – list of Ase atoms objects

  • all_species – numpy array with ints of all unique species in the dataset. If all species argument is the same for several calls of this function, resulting blocks of compositional features are guaranteed to be consisted with each other

Returns

numpy array with shape [len(structures), len(all_species)] with compositional features

utilities.get_spherical_expansion(structures, rascal_hypers, all_species, task_size=100, num_threads=None, split_by_central_specie=True, show_progress=True)

getting spherical expansion coefficients

Parameters
  • structures – list of Ase atoms objects

  • rascal_hypers – dictionary with parameters for librascal controlling spherical expansion

  • all_species – numpy array with ints of all unique species in the dataset. If all species argument is the same for several calls of this function, resulting blocks of spherical expansion coefficients are guaranteed to be consisted with each other

  • task_size – number of structures in chunk for multiprocessing

  • num_threads – number of threads in multiprocessing. If None than all available (len(os.sched_getaffinity(0))) threads are used

  • split_by_central_specie – whether group or not spherical expansion coefficients by central specie

  • show_progress – whether or not show progress via tqdm

Returns

dictionary in which keys are elements of all_speceis and entries are numpy arrays with indexing [environmental index, radial basis/neighbor specie index, lambda, m] with spherical expansion coefficients for environments around atoms with specie indicated in key. Coefficients are stored from the beginning, i. e. [:, : lambda, :(2 * lambda + 1)] elements are valid

utilities.make_structural_features(features, structures, all_species, show_progress=True)

getting structural features suitable for linear regression which consist of sums over atomic features

Parameters
  • features – nested dictionary with atomic features. First level keys are central species, second level keys are body orders. Entries are 2-dimensional numpy arrays.

  • structures – list of Ase atoms objects

  • all_species – numpy array with ints of all unique species in the dataset. If all species argument is the same for several calls of this function, resulting blocks of structural features are guaranteed to be consistent with each other. If for given block of structures there are no atoms of some particular specie, features dictionary still have to contain key with this specie. It should contain numpy arrays with shapes [0, number of features]. This is need to get proper placing of features to fulfill consistency.

  • show_progress – whether or not show progress via tqdm

Returns

numpy array with shape [len(structures), number of structural features] with structural features

utilities.transform_sequentially(nice, structures, rascal_hypers, all_species, block_size=500, show_progress=True)

transforming structures into structural features by chunks in order to use less amount of RAM

Parameters
  • nice – dictionary where keys are species and entries are nice transformers. If you want to use single nice transformer to all environments regardless of central specie just pass {key : nice_single for specie in all_species}

  • structures – list of Ase atoms objects

  • rascal_hypers – dictionary with parameters for librascal controlling spherical expansion. Should be the same as used for fitting nice transformers

  • all_species – numpy array with ints of all unique species in the dataset.

  • block_size – size of chunks measured in number of environments

  • show_progress – whether or not show progress via tqdm

Returns

numpy array with shape [len(structures), number of structural features] with structural features