pyemma.coordinates.vamp¶
-
pyemma.coordinates.
vamp
(data=None, lag=10, dim=None, scaling=None, right=False, ncov_max=inf, stride=1, skip=0, chunksize=None)¶ Variational approach for Markov processes (VAMP) 1.
- Parameters
lag (int) – lag time
dim (float or int, default=None) –
Number of dimensions to keep:
- if dim is not set (None) all available ranks are kept:
n_components == min(n_samples, n_uncorrelated_features)
if dim is an integer >= 1, this number specifies the number of dimensions to keep.
if dim is a float with
0 < dim < 1
, select the number of dimensions such that the amount of kinetic variance that needs to be explained is greater than the percentage specified by dim.
scaling (None or string) –
Scaling to be applied to the VAMP order parameters upon transformation
None: no scaling will be applied, variance of the order parameters is 1
’kinetic map’ or ‘km’: order parameters are scaled by singular value. Only the left singular functions induce a kinetic map wrt the conventional forward propagator. The right singular functions induce a kinetic map wrt the backward propagator. right : boolean
Whether to compute the right singular functions. If right==True, get_output() will return the right singular functions. Otherwise, get_output() will return the left singular functions. Beware that only frames[tau:, :] of each trajectory returned by get_output() contain valid values of the right singular functions. Conversely, only frames[0:-tau, :] of each trajectory returned by get_output() contain valid values of the left singular functions. The remaining frames might possibly be interpreted as some extrapolation.
epsilon (float) – eigenvalue cutoff. Eigenvalues of \(C_{00}\) and \(C_{11}\) with norms <= epsilon will be cut off. The remaining number of eigenvalues together with the value of dim define the size of the output.
stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.
skip (int, default=0) – skip the first initial n frames per trajectory.
ncov_max (int, default=infinity) – limit the memory usage of the algorithm from 3 to an amount that corresponds to ncov_max additional copies of each correlation matrix
- Returns
vamp – It contains the definitions of singular functions and singular values and can be used to project input data to the dominant VAMP components, predict expectations and time-lagged covariances and perform a Chapman-Kolmogorov test.
- Return type
a
VAMP
transformation object
Notes
VAMP is a method for dimensionality reduction of Markov processes.
The Koopman operator \(\mathcal{K}\) is an integral operator that describes conditional future expectation values. Let \(p(\mathbf{x},\,\mathbf{y})\) be the conditional probability density of visiting an infinitesimal phase space volume around point \(\mathbf{y}\) at time \(t+\tau\) given that the phase space point \(\mathbf{x}\) was visited at the earlier time \(t\). Then the action of the Koopman operator on a function \(f\) can be written as follows:
\[\mathcal{K}f=\int p(\mathbf{x},\,\mathbf{y})f(\mathbf{y})\,\mathrm{dy}=\mathbb{E}\left[f(\mathbf{x}_{t+\tau}\mid\mathbf{x}_{t}=\mathbf{x})\right]\]The Koopman operator is defined without any reference to an equilibrium distribution. Therefore it is well-defined in situations where the dynamics is irreversible or/and non-stationary such that no equilibrium distribution exists.
If we approximate \(f\) by a linear superposition of ansatz functions \(\boldsymbol{\chi}\) of the conformational degrees of freedom (features), the operator \(\mathcal{K}\) can be approximated by a (finite-dimensional) matrix \(\mathbf{K}\).
The approximation is computed as follows: From the time-dependent input features \(\boldsymbol{\chi}(t)\), we compute the mean \(\boldsymbol{\mu}_{0}\) (\(\boldsymbol{\mu}_{1}\)) from all data excluding the last (first) \(\tau\) steps of every trajectory as follows:
\[ \begin{align}\begin{aligned}\boldsymbol{\mu}_{0} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\boldsymbol{\chi}(t)\\\boldsymbol{\mu}_{1} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\boldsymbol{\chi}(t)\end{aligned}\end{align} \]Next, we compute the instantaneous covariance matrices \(\mathbf{C}_{00}\) and \(\mathbf{C}_{11}\) and the time-lagged covariance matrix \(\mathbf{C}_{01}\) as follows:
\[ \begin{align}\begin{aligned}\mathbf{C}_{00} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\mathbf{C}_{11} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\\\mathbf{C}_{01} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t+\tau)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align} \]The Koopman matrix is then computed as follows:
\[\mathbf{K}=\mathbf{C}_{00}^{-1}\mathbf{C}_{01}\]It can be shown 1 that the leading singular functions of the half-weighted Koopman matrix
\[\bar{\mathbf{K}}:=\mathbf{C}_{00}^{-\frac{1}{2}}\mathbf{C}_{01}\mathbf{C}_{11}^{-\frac{1}{2}}\]encode the best reduced dynamical model for the time series.
The singular functions can be computed by first performing the singular value decomposition
\[\bar{\mathbf{K}}=\mathbf{U}^{\prime}\mathbf{S}\mathbf{V}^{\prime}\]and then mapping the input conformation to the left singular functions \(\boldsymbol{\psi}\) and right singular functions \(\boldsymbol{\phi}\) as follows:
\[ \begin{align}\begin{aligned}\boldsymbol{\psi}(t):=\mathbf{U}^{\prime\top}\mathbf{C}_{00}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\boldsymbol{\phi}(t):=\mathbf{V}^{\prime\top}\mathbf{C}_{11}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align} \]References
- 1(1,2)
Wu, H. and Noe, F. 2017. Variational approach for learning Markov processes from time series data. arXiv:1707.04659v1
- 2
Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
- 3
Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.