# Examples¶

## Introduction¶

This section demonstrate SIMPLE-NN with examples. Example files are in SIMPLE-NN/examples/. In this example, snapshots from 500K MD trajectory of amorphous SiO2 (60 atoms) are used as training set.

Note

Since we set the relative path for reference file in str_list, You need to move to the directory indicated in each section below to run the examples.

## Generate NNP¶

To generate NNP using symmetry function and neural network, you need three types of input file (input.yaml, str_list, params_XX) as described in Tutorials section. The example files except params_Si and params_O are introduced below. Detail of params_Si and params_O can be found in Symmetry function section. Input files introduced in this section can be found in SIMPLE-NN/examples/SiO2/generate_NNP.

# input.yaml
generate_features: true
preprocess: true
train_model: true
atom_types:
- Si
- O

symmetry_function:
params:
Si: params_Si
O: params_O

neural_network:
method: Adam
nodes: 30-30
batch_size: 10
total_iteration: 50000
learning_rate: 0.001

# str_list
../ab_initio_output/OUTCAR_comp ::10


With this input file, SIMPLE-NN calculate feature vectors and its derivatives (generate_features), generate training/validation dataset (preprocess) and optimize the network (train_model). Sample VASP OUTCAR file (the file is compressed to reduce the file size) is in SIMPLE-NN/examples/SiO2/ab_initio_output. In MD trajectory, snapshots are sampled in the interval of 10 MD steps. In this example, 70 symmetry functions consist of 8 radial symmetry functions per 2-body combination and 18 angular symmetry functions per 3-body combination. Thus, this model uses 70-30-30-1 network for both Si and O. The network is optimized by Adam optimizer with the 0.001 of learning rate and batch size is 10.

Output files can be found in SIMPLE-NN/examples/SiO2/generate_NNP/outputs. In the folder, generated dataset is stored in data folder and execution log and energy/force RMSE are stored in LOG.

## Potential test¶

### Generate test dataset¶

Generating a test dataset is same as generating a training/validation dataset. In this example, we use same VASP OUTCAR to generate test dataset. Input files introduced in this section can be found in SIMPLE-NN/examples/SiO2/generate_test_data.

# input.yaml
generate_features: true
preprocess: true
train_model: false
atom_types:
- Si
- O

symmetry_function:
params:
Si: params_Si
O: params_O
valid_rate: 0.


In this case, train_model is set to false because training process is not required to generate test dataset. In addition, valid_rate also set to 0. str_list is same as Generate NNP section.

Note

To prevent overwriting of the existing training/validation dataset, create a new folder and create a test dataset.

### Error check¶

To check the error for test dataset, use the setting below. And for running test mode, you need to copy the train_list file generated in Generate test dataset section to SIMPLE-NN/examples/SiO2/error_check and change filename to test_list. Edit the path to data directory in test_list file accordingly. For example, it should be changed from ./data/training_data_0000_to_0006.tfrecord to ../generate_test_data/data/training_data_0000_to_0006.tfrecord in this example. Also, copy scale_factor and params_* to the current directory. These files contain information on data set, so you have to carry them with the data set. Input files introduced in this section can be found in SIMPLE-NN/examples/SiO2/error_check.

# input.yaml
generate_features: false
preprocess: false
train_model: true
atom_types:
- Si
- O

symmetry_function:
params:
Si: params_Si
O: params_O

neural_network:
method: Adam
nodes: 30-30
batch_size: 10
train: false
test: true
continue: true


Note

You need to change the filename from SAVER_iterationXXXX.* to SAVER.* to use the option continue: true and modify the checkpoints file (remove ‘_iterationXXXX’ in the text). If you use the option continue: weights, change the filename from potential_saved_iterationXXXX to potential_saved.

After running SIMPLE-NN with the setting above, new output file named test_result is generated. The file is pickle format and you can open this file with python code of below:

from six.moves import cPickle as pickle

with open('test_result') as fil:
res = pickle.load(fil) # For Python 2

with open('test_result', 'rb') as fil:
res = pickle.load(fil, encoding='latin1') # For Python 3


In the file, DFT energies/forces, NNP energies/forces are included.

## Molecular dynamics¶

Please check in Tutorials section for detailed LAMMPS script writing.

## Principal component analysis¶

SIMPLE-NN provides principal component analysis (PCA) as a method for preprocessing input descriptor vector. Input descriptor vector, including Behler-type symmetry functions, often has high correlation between components. In that case, decorrelating input descriptor vector using PCA before feeding it to a machine-learning model can give much faster convergence.

In order to use PCA, add following lines in input.yaml when you do preprocess and when you do training and testing. For detailed descriptions of input parameters, see here.

neural_network:
pca: true
pca_whiten: true
pca_min_whiten_level: 1.0e-8


A pickle file named pca will be generated during the preprocessing. You need to copy pca file to where you run SIMPLE-NN with trained model, just like scale_factor file.

## Parameter tuning¶

### GDF¶

GDF [1] is used to reduce the force errors of the sparsely sampled atoms. To use GDF, you need to calculate the $$\rho(\mathbf{G})$$ by adding the following lines to the symmetry_function section in input.yaml. SIMPLE-NN supports automatic parameter generation scheme for $$\sigma$$ and $$c$$. Use the setting sigma: Auto to get a robust $$\sigma$$ and $$c$$ (values are stored in LOG file). Input files introduced in this section can be found in SIMPLE-NN/examples/SiO2/parameter_tuning_GDF.

#symmetry_function:
#continue: true # if individual pickle file is not deleted
atomic_weights:
type: gdf
params:
sigma: Auto
# for manual setting
#  Si: 0.02
#  O: 0.02


$$\rho(\mathbf{G})$$ indicates the density of each training point. After calculating $$\rho(\mathbf{G})$$, histograms of $$\rho(\mathbf{G})^{-1}$$ are also saved as in the file of GDFinv_hist_XX.pdf.

Note

If there is a peak in high $$\rho(\mathbf{G})^{-1}$$ region in the histogram, increasing the Gaussian weight($$\sigma$$) is recommended until the peak is removed. On the contrary, if multiple peaks are shown in low $$\rho(\mathbf{G})^{-1}$$ region in the histogram, reduce $$\sigma$$ is recommended until the peaks are combined.

In the default setting, the group of $$\rho(\mathbf{G})^{-1}$$ is scaled to have average value of 1. The interval-averaged force error with respect to the $$\rho(\mathbf{G})^{-1}$$ can be visualized with the following script.

from simple_nn.utils import graph as grp

grp.plot_error_vs_gdfinv(['Si','O'], 'test_result')


where test_result is generated after Error check as the output file. The graph of interval-averaged force errors with respect to the $$\rho(\mathbf{G})^{-1}$$ is generated as ferror_vs_GDFinv_XX.pdf

If default GDF is not sufficient to reduce the force error of sparsely sampled training points, One can use scale function to increase the effect of GDF. In scale function, $$b$$ controls the decaying rate for low $$\rho(\mathbf{G})^{-1}$$ and $$c$$ separates highly concentrated and sparsely sampled training points. To use the scale function, add following lines to the symmetry_function section in input.yaml.

#symmetry_function:
weight_modifier:
type: modified sigmoid
params:
Si:
b: 0.02
c: 3500.
O:
b: 0.02
c: 10000.


For our experience, $$b=1.0$$ and automatically selected $$c$$ shows reasonable results. To check the effect of scale function, use the following script for visualizing the force error distribution according to $$\rho(\mathbf{G})^{-1}$$. In the script below, test_result_noscale is the test result file from the training without scale function and test_result_wscale is the test result file from the training with scale function.

from simple_nn.utils import graph as grp

grp.plot_error_vs_gdfinv(['Si','O'], 'test_result_noscale', 'test_result_wscale')


## Uncertainty Estimation¶

Replica ensemble [2] is used to estimate the atomic-resolution uncertainty. Please read above paper for details. We recommend you to make independent directories for each step

Note

Before following steps, you have prepared *.pickle in path/data/. If not, please run with below options first.

#input.yaml
generate_feature: true
preprocess: false
train_model: false

symmetry_function:
remain_pickle: true (default: false)


### Step 1. Extract the atomic energy¶

Extract the atomic energy that will be used for reference of replicas. Make test_list as described in Potential test and prepare the potential_saved

#input.yaml
generate_feature: false
preprocess: false
train_model: true

neural_network:
NNP_to_pickle: true
test: false
train: false
continue: true (or weights)


### Step 2. Write the data into tfrecord¶

Convert *.pickles into tfrecord to feed input data during training

#input.yaml
generate_feature: false
preprocess: true
train_model: false

symmetry_function:
add_NNP_ref: true
continue: true


### Step 3. Train with atomic energy¶

Train model with atomic energy only to speed up (use_force and use_stress are false). Choose a suitable the number of nodes and standard deviation of initial weight. Repeat this step several times by changing the number of nodes.

#input.yaml
generate_feature: false
preprocess: false
train_model: true

neural_network:
NNP_to_pickle: false
use_force: false
use_stress: false
nodes: (user's choice)
test: false
train: true
continue: false
E_loss: 3
weight_initializer:
params:
stddev: (user's choice)

symmetry_function:
add_NNP_ref: true
continue: true


### Step 4. Molecular dynamics¶

Note

Before this step, you have to compile your LAMMPS with pair_nn_replica.cpp and pair_nn_replica.h.

LAMMPS can calculate the atomic uncertainty through standard deviation of atomic energies. Because our NNP do not deal with charged system, atomic uncertainty can be written as atomic charge. Prepare your data file as charge format and please modify your LAMMPS input as below example.

atom_style  charge
pair_style  nn/r (# of replica potentials)
pair_coeff  * * (reference potential) (element1) (element2) ... &
(replica potential_#1) &
(replica_potential_#2) &
...
compute     (ID) (group-ID) property/atom q