MSM-analysis for alanine-dipeptide¶
This notebook collects usage examples for MSM-analysis using members of the pyemma.msm.analysis package.
A given MSM, estimated from alanine-dipeptide simulation data at lagtime \(\tau=6ps\), is used as an example to carry out analysis.
The necessary inputs are:
- the transition matrix, ‘T.dat’
- the centers of the \((\phi, \psi)\) dihedral angle space regular grid discretization, ‘grid_centers20x20.dat’
- the largest set of connected microstates, ‘lcc.dat’
Auxiliary functions in ‘plotting.py’ are used to generate figures of the estimated quantities.
Use ipythons magic % commands to activate plotting within notebook cells¶
%matplotlib inline
Imports are ordered as¶
- standard library imports
- third party imports
- local application/library specific imports
import numpy as np
from pyemma.msm.io import read_matrix
from pyemma.msm.analysis import stationary_distribution, eigenvectors, eigenvalues, timescales, pcca
import plotting
Load necessary input data¶
Use pyemma.msm.io.read_matrix function to read dense arrays from ascii files. The returned objects will be dense arrays (numpy.ndarray).
T=read_matrix('T.dat')
This notebook collects usage examples for MSM-analysis using members of the pyemma.msm.analysis package.
Starting from
centers=read_matrix('grid_centers20x20.dat')
The optional dtype (data-type) keyword allows you to specify the type of the read data. The default value is dtype=float.
lcc=read_matrix('lcc.dat', dtype=int)
Use the integer values given by the largest connected set as indices to “slice” the array of grid-center points. The returned array contains only those centers corresponding to the mircrostates in the largest connected set.
centers=centers[lcc, :]
Compute the stationary distribution using the pyemma.msm.analysis.stationary_distribution method.
pi=stationary_distribution(T)
The (centers, pi) tuple is fed into an adapted plotting subroutine producing a contour plot from the scattered data. Since scatterd data can not directly be used to produce a contour plot over the whole \((\phi, \psi)\)-plane the given data is interpolated onto a regular grid before producing a contour plot. Some of the strange-looking low probability iso-lines may be artefacts of the interpolation. Interpolation on the level of free energies is probably a better idea.
plotting.stationary_distribution(centers, pi)
For \(T=300K\) we have \(\beta=0.4 \frac{mol}{kJ}\). The free-energy is defined as \(A_i=-\frac{1}{\beta} \log \pi_i\)
A=-1.0/0.4*np.log(pi)
Since we can only estimate free-energy differences we set the \(\min{A_i}=0\)
A=A-A.min()
For plotting we chose the equally spaced contour levels in the interval \([0, 30] \frac{kJ}{mol}\). For the interpolation onto a regular grid we chose cubic splines. Grid points that lie outside of the convex-hull of the given center points are assigned the maximum value of \(A\).
The plot shows the separation of the dihedral plane into a low (free-)energy region $ 0 $ and the region of high free-energy \(\phi >0\).
- The low lying region contains three metastable sets seperated by a barrier of approximately \(4 \frac{kJ}{mol}\) and approximately \(10 \frac{kJ}{mol}\).
- There are two metastable sets in the high energy region seperated by a barrier of approximately \(7 \frac{kJ}{mol}\).
- The barrier between the low energy and the high energy region is approximately \(23 \frac{kJ}{mol}\).
plotting.free_energy(centers, A, levels=np.linspace(0.0, 30.0, 10), method='cubic', fill_value=A.max())
Eigenvectors¶
We compute the right eigenvectors corresponding to the 4 largest eigenvalues.
R=eigenvectors(T, k=4)
The first eigenvector shows a sign change from the most stable region with \(\phi \leq 0\) to the \(\phi>0\) region. The slowest process corresponds to a transition between the two most stable states and the metastable regions with \(\phi>0\).
ev=R[:, 1].real
plotting.eigenvector(centers, ev, levels=np.linspace(ev.min(), ev.max(), 10))
The second eigenvector shows a sign change from \(\phi \leq 0\) to \(\phi>0\). The second slowest process is the transition between the low-probability region \(\phi>0\) and the high probability region \(\phi \leq 0\).
ev=R[:, 2].real
plotting.eigenvector(centers, ev, levels=np.linspace(ev.min(), ev.max(), 11), fmt='%.e')
The third eigenvector shows the transition process between the least probable meta-stable state and the rest of the accessible state space.
ev=R[:, 3].real
plotting.eigenvector(centers, ev, levels=np.linspace(ev.min(), ev.max(), 11), fmt='%.e')
Eigenvalues¶
Compute the 10 largest eigenvalues of the MSM
eigvals=eigenvalues(T)[0:11]
The first \(5\) eigenvalues are purely real. The remaining eigenvalues occur in complex-conjugate pairs. That is because \(T\) is a matrix with purely real entries.
eigvals
array([ 1.00000000+0.j , 0.94808553+0.j ,
0.94092025+0.j , 0.66447475+0.j ,
0.38530146+0.j , 0.34550046+0.00929879j,
0.34550046-0.00929879j, 0.24977533+0.25204877j,
0.24977533-0.25204877j, 0.23257796+0.19019451j,
0.23257796-0.19019451j])
There is a distinct gap in the spectrum betwenn the third and the fourth eigenvalue.
plotting.eigenvalues(eigvals)
Implied time scales¶
Implied time scales are computed via msm.analysis.timescales. The lagtime of the Markov model, \(\tau=6 ps\), can be specified via the optional keyword tau. The default value is tau=1.
ts=timescales(T, k=5, tau=6)
ts
array([ inf, 112.54805277, 98.52720209, 14.67859726,
6.29109374])
PCCA¶
membership=pcca(T, 5)
membership_crisp=np.where(membership>0.4)
PCCA gives accurate memberships for the high probability region. Assigning correct memberships for the low probability states, \(\phi>0\), is problematic.
plotting.pcca(centers, membership_crisp)
Summary¶
The pyemma.msm.analysis module can be used to analyse an estimated transition matrix. Starting from the transition matrix \(T\) It is possible to
- compute the stationary vector \(\pi\) to analyze the free-energy landscape given suitable (low-dimensional) coordinates
- compute the right eigenvectors to investigate slowest dynamical processes
- compute eigenvalues and time scales as quantitative information about system-dynamics
The pyemma.msm-API is designed to allow fast and flexible scripting of the whole estimation and analysis process. There is a multitude of functions for MSM analysis provided in the emma2.msm.analysis module. Further functions are
- checks for stochasticity, ergodicity, etc.
- commitor computation
- TPT
- mean-first-passage time (mfpt) computations
- fingerprint: expectation and autocorrelation
- decompositions in eigenvalues, left, and right eigenvectors
We are happy for your feedback and suggestions. Please feel free to contact our mailing list at emma@lists.fu-berlin.de