pyemma.coordinates.vamp

pyemma.coordinates.vamp(data=None, lag=10, dim=None, scaling=None, right=False, ncov_max=inf, stride=1, skip=0, chunksize=None)

Variational approach for Markov processes (VAMP) 1.

Parameters
  • lag (int) – lag time

  • dim (float or int, default=None) –

    Number of dimensions to keep:

    • if dim is not set (None) all available ranks are kept:

      n_components == min(n_samples, n_uncorrelated_features)

    • if dim is an integer >= 1, this number specifies the number of dimensions to keep.

    • if dim is a float with 0 < dim < 1, select the number of dimensions such that the amount of kinetic variance that needs to be explained is greater than the percentage specified by dim.

  • scaling (None or string) –

    Scaling to be applied to the VAMP order parameters upon transformation

    • None: no scaling will be applied, variance of the order parameters is 1

    • ’kinetic map’ or ‘km’: order parameters are scaled by singular value. Only the left singular functions induce a kinetic map wrt the conventional forward propagator. The right singular functions induce a kinetic map wrt the backward propagator. right : boolean

    Whether to compute the right singular functions. If right==True, get_output() will return the right singular functions. Otherwise, get_output() will return the left singular functions. Beware that only frames[tau:, :] of each trajectory returned by get_output() contain valid values of the right singular functions. Conversely, only frames[0:-tau, :] of each trajectory returned by get_output() contain valid values of the left singular functions. The remaining frames might possibly be interpreted as some extrapolation.

  • epsilon (float) – eigenvalue cutoff. Eigenvalues of C00 and C11 with norms <= epsilon will be cut off. The remaining number of eigenvalues together with the value of dim define the size of the output.

  • stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.

  • skip (int, default=0) – skip the first initial n frames per trajectory.

  • ncov_max (int, default=infinity) – limit the memory usage of the algorithm from 3 to an amount that corresponds to ncov_max additional copies of each correlation matrix

Returns

vamp – It contains the definitions of singular functions and singular values and can be used to project input data to the dominant VAMP components, predict expectations and time-lagged covariances and perform a Chapman-Kolmogorov test.

Return type

a VAMP transformation object

Notes

VAMP is a method for dimensionality reduction of Markov processes.

The Koopman operator K is an integral operator that describes conditional future expectation values. Let p(x,y) be the conditional probability density of visiting an infinitesimal phase space volume around point y at time t+τ given that the phase space point x was visited at the earlier time t. Then the action of the Koopman operator on a function f can be written as follows:

Kf=p(x,y)f(y)dy=E[f(xt+τxt=x)]

The Koopman operator is defined without any reference to an equilibrium distribution. Therefore it is well-defined in situations where the dynamics is irreversible or/and non-stationary such that no equilibrium distribution exists.

If we approximate f by a linear superposition of ansatz functions χ of the conformational degrees of freedom (features), the operator K can be approximated by a (finite-dimensional) matrix K.

The approximation is computed as follows: From the time-dependent input features χ(t), we compute the mean μ0 (μ1) from all data excluding the last (first) τ steps of every trajectory as follows:

μ0:=1TτTτt=0χ(t)μ1:=1TτTt=τχ(t)

Next, we compute the instantaneous covariance matrices C00 and C11 and the time-lagged covariance matrix C01 as follows:

C00:=1TτTτt=0[χ(t)μ0][χ(t)μ0]C11:=1TτTt=τ[χ(t)μ1][χ(t)μ1]C01:=1TτTτt=0[χ(t)μ0][χ(t+τ)μ1]

The Koopman matrix is then computed as follows:

K=C100C01

It can be shown 1 that the leading singular functions of the half-weighted Koopman matrix

ˉK:=C1200C01C1211

encode the best reduced dynamical model for the time series.

The singular functions can be computed by first performing the singular value decomposition

ˉK=USV

and then mapping the input conformation to the left singular functions ψ and right singular functions ϕ as follows:

ψ(t):=UC1200[χ(t)μ0]ϕ(t):=VC1211[χ(t)μ1]

References

1(1,2)

Wu, H. and Noe, F. 2017. Variational approach for learning Markov processes from time series data. arXiv:1707.04659v1

2

Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553

3

Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.