pyemma.coordinates.vamp¶

pyemma.coordinates.vamp(data=None, lag=10, dim=None, scaling=None, right=False, ncov_max=inf, stride=1, skip=0, chunksize=None)¶

Variational approach for Markov processes (VAMP) 1.

Parameters

lag (int) – lag time
dim (float or int, default=None) –
Number of dimensions to keep:
- if dim is not set (None) all available ranks are kept:
  n_components == min(n_samples, n_uncorrelated_features)
- if dim is an integer >= 1, this number specifies the number of dimensions to keep.
- if dim is a float with 0 < dim < 1, select the number of dimensions such that the amount of kinetic variance that needs to be explained is greater than the percentage specified by dim.
scaling (None or string) –
Scaling to be applied to the VAMP order parameters upon transformation
- None: no scaling will be applied, variance of the order parameters is 1
- ’kinetic map’ or ‘km’: order parameters are scaled by singular value. Only the left singular functions induce a kinetic map wrt the conventional forward propagator. The right singular functions induce a kinetic map wrt the backward propagator. right : boolean
Whether to compute the right singular functions. If right==True, get_output() will return the right singular functions. Otherwise, get_output() will return the left singular functions. Beware that only frames[tau:, :] of each trajectory returned by get_output() contain valid values of the right singular functions. Conversely, only frames[0:-tau, :] of each trajectory returned by get_output() contain valid values of the left singular functions. The remaining frames might possibly be interpreted as some extrapolation.
epsilon (float) – eigenvalue cutoff. Eigenvalues of $C_{00}$ and $C_{11}$ with norms <= epsilon will be cut off. The remaining number of eigenvalues together with the value of dim define the size of the output.
stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.
skip (int, default=0) – skip the first initial n frames per trajectory.
ncov_max (int, default=infinity) – limit the memory usage of the algorithm from 3 to an amount that corresponds to ncov_max additional copies of each correlation matrix

Returns

vamp – It contains the definitions of singular functions and singular values and can be used to project input data to the dominant VAMP components, predict expectations and time-lagged covariances and perform a Chapman-Kolmogorov test.

Return type

a VAMP transformation object

Notes

VAMP is a method for dimensionality reduction of Markov processes.

The Koopman operator $\mathcal{K}$ is an integral operator that describes conditional future expectation values. Let $p(\mathbf{x},\,\mathbf{y})$ be the conditional probability density of visiting an infinitesimal phase space volume around point $\mathbf{y}$ at time $t+\tau$ given that the phase space point $\mathbf{x}$ was visited at the earlier time $t$ . Then the action of the Koopman operator on a function $f$ can be written as follows:

$\mathcal{K}f=\int p(\mathbf{x},\,\mathbf{y})f(\mathbf{y})\,\mathrm{dy}=\mathbb{E}\left[f(\mathbf{x}_{t+\tau}\mid\mathbf{x}_{t}=\mathbf{x})\right]$

The Koopman operator is defined without any reference to an equilibrium distribution. Therefore it is well-defined in situations where the dynamics is irreversible or/and non-stationary such that no equilibrium distribution exists.

If we approximate $f$ by a linear superposition of ansatz functions $\boldsymbol{\chi}$ of the conformational degrees of freedom (features), the operator $\mathcal{K}$ can be approximated by a (finite-dimensional) matrix $\mathbf{K}$ .

The approximation is computed as follows: From the time-dependent input features $\boldsymbol{\chi}(t)$ , we compute the mean $\boldsymbol{\mu}_{0}$ ( $\boldsymbol{\mu}_{1}$ ) from all data excluding the last (first) $\tau$ steps of every trajectory as follows:

$\begin{align}\begin{aligned}\boldsymbol{\mu}_{0} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\boldsymbol{\chi}(t)\\\boldsymbol{\mu}_{1} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\boldsymbol{\chi}(t)\end{aligned}\end{align}$

Next, we compute the instantaneous covariance matrices $\mathbf{C}_{00}$ and $\mathbf{C}_{11}$ and the time-lagged covariance matrix $\mathbf{C}_{01}$ as follows:

$\begin{align}\begin{aligned}\mathbf{C}_{00} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\mathbf{C}_{11} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\\\mathbf{C}_{01} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t+\tau)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align}$

The Koopman matrix is then computed as follows:

$\mathbf{K}=\mathbf{C}_{00}^{-1}\mathbf{C}_{01}$

It can be shown 1 that the leading singular functions of the half-weighted Koopman matrix

$\bar{\mathbf{K}}:=\mathbf{C}_{00}^{-\frac{1}{2}}\mathbf{C}_{01}\mathbf{C}_{11}^{-\frac{1}{2}}$

encode the best reduced dynamical model for the time series.

The singular functions can be computed by first performing the singular value decomposition

$\bar{\mathbf{K}}=\mathbf{U}^{\prime}\mathbf{S}\mathbf{V}^{\prime}$

and then mapping the input conformation to the left singular functions $\boldsymbol{\psi}$ and right singular functions $\boldsymbol{\phi}$ as follows:

$\begin{align}\begin{aligned}\boldsymbol{\psi}(t):=\mathbf{U}^{\prime\top}\mathbf{C}_{00}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\boldsymbol{\phi}(t):=\mathbf{V}^{\prime\top}\mathbf{C}_{11}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align}$

References

1(1,2): Wu, H. and Noe, F. 2017. Variational approach for learning Markov processes from time series data. arXiv:1707.04659v1
2: Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
3: Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.