Custom regressors into purifiers
As was already mentioned in the first tutorial, purifiers can accept arbitrarily linear regressors from sklearn.linear_model. In order to feed it with a custom linear regressor, some requirements should be fulfilled. Firstly, it should have the same interface as linear regressors from sklearn with the fit and predict methods. Secondly, it should fulfill sklearn requirements to make it possible to clone with sklearn.base.clone function. This tutorial shows an example of such a class.
As before, let’s calculate spherical expansion coefficients for H environments:
Preliminaries
# downloading dataset from https://archive.materialscloud.org/record/2020.110
!wget "https://archive.materialscloud.org/record/file?file_id=b612d8e3-58af-4374-96ba-b3551ac5d2f4&filename=methane.extxyz.gz&record_id=528" -O methane.extxyz.gz
!gunzip -k methane.extxyz.gz
import numpy as np
import ase.io
import tqdm
from nice.blocks import *
from nice.utilities import *
from matplotlib import pyplot as plt
from sklearn.linear_model import BayesianRidge
structures = ase.io.read('methane.extxyz', index='0:1000')
HYPERS = {
'interaction_cutoff': 6.3,
'max_radial': 5,
'max_angular': 5,
'gaussian_sigma_type': 'Constant',
'gaussian_sigma_constant': 0.05,
'cutoff_smooth_width': 0.3,
'radial_basis': 'GTO'
}
all_species = get_all_species(structures)
coefficients = get_spherical_expansion(structures, HYPERS, all_species)
coefficients = coefficients[1]
Our custom class looks like this:
from sklearn.model_selection import cross_val_predict
from sklearn.linear_model import Ridge
class AdaptiveRidge:
def __init__(self):
pass
def fit(self, X, y):
minimum = None
self.best_alpha_ = None
for alpha in np.logspace(-25, 10, 300):
regressor = Ridge(alpha=alpha, fit_intercept=False)
predictions = cross_val_predict(regressor, X, y)
now = np.mean((predictions - y)**2)
if (minimum is None) or (now < minimum):
minimum = now
self.best_alpha_ = alpha
self.ridge_ = Ridge(alpha=self.best_alpha_, fit_intercept=False)
self.ridge_.fit(X, y)
def predict(self, X):
return self.ridge_.predict(X)
def get_params(self, deep=True):
return {}
def set_params(self, **parameters):
for parameter, value in parameters.items():
setattr(self, parameter, value)
return self
During fitting it estimates best value of regularization by cross validation using training data. There are additional methods get_params and set_params. These methods are required for sklearn.base.clone function. More details about it here (It is necessary to read only cloning section).
Let’s use it:
from scipy.linalg import LinAlgWarning
nice = StandardSequence([
StandardBlock(ThresholdExpansioner(50), None, IndividualLambdaPCAsBoth(20),
ThresholdExpansioner(50, mode='invariants'), None, None),
StandardBlock(
ThresholdExpansioner(50),
CovariantsPurifierBoth(regressor=AdaptiveRidge(), max_take=20),
IndividualLambdaPCAsBoth(10),
ThresholdExpansioner(50, mode='invariants'),
InvariantsPurifier(regressor=AdaptiveRidge(), max_take=20),
InvariantsPCA(20)),
])
with warnings.catch_warnings():
# a lot of ill conditioned matrices with super small alpha
warnings.filterwarnings("ignore", category=LinAlgWarning)
nice.fit(coefficients)
res = nice.transform(coefficients)
It is possible to access best alpha parameters for all paritiies and lambda chanels in the final model:
(convenient getters might be added in the next version of NICE)
for lambd in range(6):
if (nice.blocks_[1].covariants_purifier_.even_purifier_.purifiers_[lambd]):
print("parity: even; lambda: {}; best alpha: {}".format(
lambd, nice.blocks_[1].covariants_purifier_.even_purifier_.
purifiers_[lambd].regressor_.best_alpha_))
if (nice.blocks_[1].covariants_purifier_.odd_purifier_.purifiers_[lambd]):
print("parity odd; lambda: {}; best alpha: {}".format(
lambd, nice.blocks_[1].covariants_purifier_.odd_purifier_.
purifiers_[lambd].regressor_.best_alpha_))
The same for InvariantsPurifier:
print("best alpha of invariants purifier: ",
nice.blocks_[1].invariants_purifier_.regressor_.best_alpha_)