Calculating covariants

In the previous tutorial, we calculated invariant representations of atomic environments and used them for the prediction of energies - invariant properties.

In the case when there is a need to predict covariant properties, covariants instead of invariants are required. This tutorial shows how to calculate them.

First of all, we need to get fitted instance of the model as in the previous tutorial. It is done by the following preliminaries cell: (with the only difference that since we want to calculate covariants, we clearly shouldn’t leave the covariants branch of the last block empty)

Preliminaries

# cell to wrap in collapsible in future

# downloading dataset from https://archive.materialscloud.org/record/2020.110

!wget "https://archive.materialscloud.org/record/file?file_id=b612d8e3-58af-4374-96ba-b3551ac5d2f4&filename=methane.extxyz.gz&record_id=528" -O methane.extxyz.gz
!gunzip -k methane.extxyz.gz

import numpy as np
import ase.io
import tqdm
from nice.blocks import *
from nice.utilities import *
from matplotlib import pyplot as plt
from sklearn.linear_model import BayesianRidge

HARTREE_TO_EV = 27.211386245988
train_subset = "0:10000"  #input for ase.io.read command
test_subset = "10000:15000"  #input to ase.io.read command
environments_for_fitting = 1000  #number of environments to fit nice transfomers
grid = [150, 200, 350, 500, 750, 1000, 1500, 2000, 3000, 5000, 7500,
        10000]  #for learning curve

#HYPERS for librascal spherical expansion coefficients
HYPERS = {
    'interaction_cutoff': 6.3,
    'max_radial': 5,
    'max_angular': 5,
    'gaussian_sigma_type': 'Constant',
    'gaussian_sigma_constant': 0.05,
    'cutoff_smooth_width': 0.3,
    'radial_basis': 'GTO'
}


#our model:
def get_nice():
    return StandardSequence([
        StandardBlock(ThresholdExpansioner(num_expand=150),
                      CovariantsPurifierBoth(max_take=10),
                      IndividualLambdaPCAsBoth(n_components=50),
                      ThresholdExpansioner(num_expand=300, mode='invariants'),
                      InvariantsPurifier(max_take=50),
                      InvariantsPCA(n_components=200)),
        StandardBlock(ThresholdExpansioner(num_expand=150),
                      CovariantsPurifierBoth(max_take=10),
                      IndividualLambdaPCAsBoth(n_components=10),
                      ThresholdExpansioner(num_expand=300, mode='invariants'),
                      InvariantsPurifier(max_take=50),
                      InvariantsPCA(n_components=200)),
        StandardBlock(ThresholdExpansioner(num_expand=150),
                      CovariantsPurifierBoth(max_take=10), None,
                      ThresholdExpansioner(num_expand=300, mode='invariants'),
                      InvariantsPurifier(max_take=50),
                      InvariantsPCA(n_components=200))
    ],
                            initial_scaler=InitialScaler(
                                mode='signal integral', individually=True))


train_structures = ase.io.read('methane.extxyz', index=train_subset)

test_structures = ase.io.read('methane.extxyz', index=test_subset)

all_species = get_all_species(train_structures + test_structures)

train_coefficients = get_spherical_expansion(train_structures, HYPERS,
                                             all_species)

test_coefficients = get_spherical_expansion(test_structures, HYPERS,
                                            all_species)

#individual nice transformers for each atomic specie in the dataset
nice = {}
for key in train_coefficients.keys():
    nice[key] = get_nice()

for key in train_coefficients.keys():
    nice[key].fit(train_coefficients[key][:environments_for_fitting])

--2022-11-04 12:22:24-- https://archive.materialscloud.org/record/file?file_id=b612d8e3-58af-4374-96ba-b3551ac5d2f4&filename=methane.extxyz.gz&record_id=528 Resolving archive.materialscloud.org (archive.materialscloud.org)... 148.187.149.49 Connecting to archive.materialscloud.org (archive.materialscloud.org)|148.187.149.49|:443... connected. HTTP request sent, awaiting response... 302 FOUND Location: https://object.cscs.ch/archive/b6/12/d8e3-58af-4374-96ba-b3551ac5d2f4/data?response-content-type=application%2Foctet-stream&response-content-disposition=attachment%3B%20filename%3Dmethane.extxyz.gz&Signature=O2etKpL98K%2FED9aVS8xUXXiGxKo%3D&AWSAccessKeyId=f30fe0bddb114e91abe6adf3d36c6f2e&Expires=1667561004 [following] --2022-11-04 12:22:24-- https://object.cscs.ch/archive/b6/12/d8e3-58af-4374-96ba-b3551ac5d2f4/data?response-content-type=application%2Foctet-stream&response-content-disposition=attachment%3B%20filename%3Dmethane.extxyz.gz&Signature=O2etKpL98K%2FED9aVS8xUXXiGxKo%3D&AWSAccessKeyId=f30fe0bddb114e91abe6adf3d36c6f2e&Expires=1667561004 Resolving object.cscs.ch (object.cscs.ch)... 148.187.25.201, 148.187.25.202, 148.187.25.200, ... Connecting to object.cscs.ch (object.cscs.ch)|148.187.25.201|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1218139661 (1.1G) [application/octet-stream] Saving to: ‘methane.extxyz.gz’ methane.extxyz.gz 100%[===================>] 1.13G 99.8MB/s in 11s 2022-11-04 12:22:35 (108 MB/s) - ‘methane.extxyz.gz’ saved [1218139661/1218139661]

100%|██████████| 100/100 [00:01<00:00, 98.75it/s] 100%|██████████| 2/2 [00:00<00:00, 25.77it/s] 100%|██████████| 50/50 [00:00<00:00, 80.97it/s] 100%|██████████| 2/2 [00:00<00:00, 58.83it/s] /home/pozdn/.local/lib/python3.8/site-packages/nice/blocks/compressors.py:216: UserWarning: Amount of provided data is less than the desired one to fit PCA. Number of components is 200, desired number of environments is 2000, actual number of environments is 1000. warnings.warn(("Amount of provided data is less " /home/pozdn/.local/lib/python3.8/site-packages/nice/blocks/compressors.py:216: UserWarning: Amount of provided data is less than the desired one to fit PCA. Number of components is 200, desired number of environments is 2000, actual number of environments is 1000. warnings.warn(("Amount of provided data is less " /home/pozdn/.local/lib/python3.8/site-packages/nice/blocks/compressors.py:216: UserWarning: Amount of provided data is less than the desired one to fit PCA. Number of components is 200, desired number of environments is 2000, actual number of environments is 1000. warnings.warn(("Amount of provided data is less " /home/pozdn/.local/lib/python3.8/site-packages/nice/blocks/compressors.py:216: UserWarning: Amount of provided data is less than the desired one to fit PCA. Number of components is 200, desired number of environments is 2000, actual number of environments is 1000. warnings.warn(("Amount of provided data is less " /home/pozdn/.local/lib/python3.8/site-packages/nice/blocks/compressors.py:216: UserWarning: Amount of provided data is less than the desired one to fit PCA. Number of components is 200, desired number of environments is 2000, actual number of environments is 1000. warnings.warn(("Amount of provided data is less " /home/pozdn/.local/lib/python3.8/site-packages/nice/blocks/compressors.py:216: UserWarning: Amount of provided data is less than the desired one to fit PCA. Number of components is 200, desired number of environments is 2000, actual number of environments is 1000. warnings.warn(("Amount of provided data is less "

Now we need to call .transform method with return_only_invariants = False, which is the default value:

data_even, data_odd, invariants_even = nice[1].transform(train_coefficients[1])

Result is data_even, data_odd and invariants_even. The first two objects are covariants. The last one is invariants.

There is another important symmetry in addition to the translational and rotational one. Usually, atomic properties, such as energy, also transform in a certain way with respect to inversion. Particularly, energy is invariant with respect to it.

In NICE, features are separated into two groups - the ones which are invariant with respect to inversion and the ones that change their sign. The first ones are called even; the second ones are called odd.

Now let’s take a look at the returned objects more closely:

Invariants is the same object as in the previous tutorial - dictionary, where keys are body order.

for key in invariants_even.keys():
    print(invariants_even[key].shape)

(40000, 10) (40000, 200) (40000, 200) (40000, 200)

Returned covariants are covariants after the last block, i. e. in our case of body order 4. (functionality to get all covariants of all body order from StandardSequence will be added in the next version of NICE)

Even covariants are packed in the class Data, which has two relevant fields - .covariants_ and .actual_sizes_. (getters are also to be added in the next version) First is np.array with covariants themselves. It has following indexing -[environmental_index, feature_index, lambda, m]. But the problem is that for each lambda channel, the actual number of features is different. Thus, the shape of this array doesn’t reflect the real number of meaningful entries. Information about the actual number of features is stored in .actual_sizes_:

print(type(data_even))
print("shape of even covariants array: {}".format(data_even.covariants_.shape))
print("actual sizes of even covariants: {}".format(data_even.actual_sizes_))

shape of even covariants array: (40000, 87, 6, 11) actual sizes of even covariants: [22 55 73 83 87 76]

It is the same for odd covariants:

print("shape of odd covariants array: {}".format(data_odd.covariants_.shape))
print("actual sizes of odd covariants: {}".format(data_odd.actual_sizes_))

shape of odd covariants array: (40000, 88, 6, 11) actual sizes of odd covariants: [20 54 72 87 88 75]

There is one other point - for each lambda channel the size of covariant vectors is (2 * lambda + 1). These vectors are stored from the beginning. It means that the meaningful entries for each lambda are located in [:, :, lambda, :(2 * lambda + 1)]

In the nice article another definition of parity is used. Covariants are split into true and pseudo groups. All the covariants in the true group are transformed with respect to inversion as (-1)^lambda, while all the covariants in the pseudo group are transformed as (-1) ^ (lambda + 1).

There is a special class - ParityDefinitionChanger to switch between these definitions:

data_true, data_pseudo = ParityDefinitionChanger().transform(
    data_even, data_odd)

print(data_true.covariants_.shape)
print(data_true.actual_sizes_)

print(data_pseudo.covariants_.shape)
print(data_pseudo.actual_sizes_)

(40000, 87, 6, 11) [22 54 73 87 87 75] (40000, 88, 6, 11) [20 55 72 83 88 76]

Since this transformation is symmetric, we can use this once again to go back from the true and pseudo covariants to even and odd:

data_even, data_odd = ParityDefinitionChanger().transform(
    data_true, data_pseudo)

There is one other discrepancy - covariants defined in the nice article, are smaller by the factor of (2 * lambda + 1). Thus, the last step to get full compliance is the following:

for lambd in range(6):
    data_true.covariants_[:, :data_true.actual_sizes_[lambd],
                          lambd, :(2 * lambd + 1)] /= (2 * lambd + 1)
    data_pseudo.covariants_[:, :data_pseudo.actual_sizes_[lambd],
                            lambd, :(2 * lambd + 1)] /= (2 * lambd + 1)