pyemma.coordinates.vamp

pyemma.coordinates.vamp(data=None, lag=10, dim=None, scaling=None, right=False, ncov_max=inf, stride=1, skip=0, chunksize=None)

Variational approach for Markov processes (VAMP) [1].

Parameters:
  • lag (int) – lag time
  • dim (float or int, default=None) –

    Number of dimensions to keep:

    • if dim is not set (None) all available ranks are kept:
      n_components == min(n_samples, n_uncorrelated_features)
    • if dim is an integer >= 1, this number specifies the number of dimensions to keep.
    • if dim is a float with 0 < dim < 1, select the number of dimensions such that the amount of kinetic variance that needs to be explained is greater than the percentage specified by dim.
  • scaling (None or string) –

    Scaling to be applied to the VAMP order parameters upon transformation

    • None: no scaling will be applied, variance of the order parameters is 1
    • ’kinetic map’ or ‘km’: order parameters are scaled by singular value. Only the left singular functions induce a kinetic map wrt the conventional forward propagator. The right singular functions induce a kinetic map wrt the backward propagator. right : boolean

    Whether to compute the right singular functions. If right==True, get_output() will return the right singular functions. Otherwise, get_output() will return the left singular functions. Beware that only frames[tau:, :] of each trajectory returned by get_output() contain valid values of the right singular functions. Conversely, only frames[0:-tau, :] of each trajectory returned by get_output() contain valid values of the left singular functions. The remaining frames might possibly be interpreted as some extrapolation.

  • epsilon (float) – eigenvalue cutoff. Eigenvalues of \(C_{00}\) and \(C_{11}\) with norms <= epsilon will be cut off. The remaining number of eigenvalues together with the value of dim define the size of the output.
  • stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.
  • skip (int, default=0) – skip the first initial n frames per trajectory.
  • ncov_max (int, default=infinity) – limit the memory usage of the algorithm from [3] to an amount that corresponds to ncov_max additional copies of each correlation matrix
Returns:

vamp – It contains the definitions of singular functions and singular values and can be used to project input data to the dominant VAMP components, predict expectations and time-lagged covariances and perform a Chapman-Kolmogorov test.

Return type:

a VAMP transformation object

Notes

VAMP is a method for dimensionality reduction of Markov processes.

The Koopman operator \(\mathcal{K}\) is an integral operator that describes conditional future expectation values. Let \(p(\mathbf{x},\,\mathbf{y})\) be the conditional probability density of visiting an infinitesimal phase space volume around point \(\mathbf{y}\) at time \(t+\tau\) given that the phase space point \(\mathbf{x}\) was visited at the earlier time \(t\). Then the action of the Koopman operator on a function \(f\) can be written as follows:

\[\mathcal{K}f=\int p(\mathbf{x},\,\mathbf{y})f(\mathbf{y})\,\mathrm{dy}=\mathbb{E}\left[f(\mathbf{x}_{t+\tau}\mid\mathbf{x}_{t}=\mathbf{x})\right]\]

The Koopman operator is defined without any reference to an equilibrium distribution. Therefore it is well-defined in situations where the dynamics is irreversible or/and non-stationary such that no equilibrium distribution exists.

If we approximate \(f\) by a linear superposition of ansatz functions \(\boldsymbol{\chi}\) of the conformational degrees of freedom (features), the operator \(\mathcal{K}\) can be approximated by a (finite-dimensional) matrix \(\mathbf{K}\).

The approximation is computed as follows: From the time-dependent input features \(\boldsymbol{\chi}(t)\), we compute the mean \(\boldsymbol{\mu}_{0}\) (\(\boldsymbol{\mu}_{1}\)) from all data excluding the last (first) \(\tau\) steps of every trajectory as follows:

\[ \begin{align}\begin{aligned}\boldsymbol{\mu}_{0} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\boldsymbol{\chi}(t)\\\boldsymbol{\mu}_{1} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\boldsymbol{\chi}(t)\end{aligned}\end{align} \]

Next, we compute the instantaneous covariance matrices \(\mathbf{C}_{00}\) and \(\mathbf{C}_{11}\) and the time-lagged covariance matrix \(\mathbf{C}_{01}\) as follows:

\[ \begin{align}\begin{aligned}\mathbf{C}_{00} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\mathbf{C}_{11} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\\\mathbf{C}_{01} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t+\tau)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align} \]

The Koopman matrix is then computed as follows:

\[\mathbf{K}=\mathbf{C}_{00}^{-1}\mathbf{C}_{01}\]

It can be shown [1] that the leading singular functions of the half-weighted Koopman matrix

\[\bar{\mathbf{K}}:=\mathbf{C}_{00}^{-\frac{1}{2}}\mathbf{C}_{01}\mathbf{C}_{11}^{-\frac{1}{2}}\]

encode the best reduced dynamical model for the time series.

The singular functions can be computed by first performing the singular value decomposition

\[\bar{\mathbf{K}}=\mathbf{U}^{\prime}\mathbf{S}\mathbf{V}^{\prime}\]

and then mapping the input conformation to the left singular functions \(\boldsymbol{\psi}\) and right singular functions \(\boldsymbol{\phi}\) as follows:

\[ \begin{align}\begin{aligned}\boldsymbol{\psi}(t):=\mathbf{U}^{\prime\top}\mathbf{C}_{00}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\boldsymbol{\phi}(t):=\mathbf{V}^{\prime\top}\mathbf{C}_{11}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align} \]

References

[1](1, 2) Wu, H. and Noe, F. 2017. Variational approach for learning Markov processes from time series data. arXiv:1707.04659v1
[2]Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
[3]Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.