pyemma.coordinates.vamp¶
-
pyemma.coordinates.
vamp
(data=None, lag=10, dim=None, scaling=None, right=False, ncov_max=inf, stride=1, skip=0, chunksize=None)¶ Variational approach for Markov processes (VAMP) 1.
- Parameters
lag (int) – lag time
dim (float or int, default=None) –
Number of dimensions to keep:
- if dim is not set (None) all available ranks are kept:
n_components == min(n_samples, n_uncorrelated_features)
if dim is an integer >= 1, this number specifies the number of dimensions to keep.
if dim is a float with
0 < dim < 1
, select the number of dimensions such that the amount of kinetic variance that needs to be explained is greater than the percentage specified by dim.
scaling (None or string) –
Scaling to be applied to the VAMP order parameters upon transformation
None: no scaling will be applied, variance of the order parameters is 1
’kinetic map’ or ‘km’: order parameters are scaled by singular value. Only the left singular functions induce a kinetic map wrt the conventional forward propagator. The right singular functions induce a kinetic map wrt the backward propagator. right : boolean
Whether to compute the right singular functions. If right==True, get_output() will return the right singular functions. Otherwise, get_output() will return the left singular functions. Beware that only frames[tau:, :] of each trajectory returned by get_output() contain valid values of the right singular functions. Conversely, only frames[0:-tau, :] of each trajectory returned by get_output() contain valid values of the left singular functions. The remaining frames might possibly be interpreted as some extrapolation.
epsilon (float) – eigenvalue cutoff. Eigenvalues of C00 and C11 with norms <= epsilon will be cut off. The remaining number of eigenvalues together with the value of dim define the size of the output.
stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.
skip (int, default=0) – skip the first initial n frames per trajectory.
ncov_max (int, default=infinity) – limit the memory usage of the algorithm from 3 to an amount that corresponds to ncov_max additional copies of each correlation matrix
- Returns
vamp – It contains the definitions of singular functions and singular values and can be used to project input data to the dominant VAMP components, predict expectations and time-lagged covariances and perform a Chapman-Kolmogorov test.
- Return type
a
VAMP
transformation object
Notes
VAMP is a method for dimensionality reduction of Markov processes.
The Koopman operator K is an integral operator that describes conditional future expectation values. Let p(x,y) be the conditional probability density of visiting an infinitesimal phase space volume around point y at time t+τ given that the phase space point x was visited at the earlier time t. Then the action of the Koopman operator on a function f can be written as follows:
Kf=∫p(x,y)f(y)dy=E[f(xt+τ∣xt=x)]The Koopman operator is defined without any reference to an equilibrium distribution. Therefore it is well-defined in situations where the dynamics is irreversible or/and non-stationary such that no equilibrium distribution exists.
If we approximate f by a linear superposition of ansatz functions χ of the conformational degrees of freedom (features), the operator K can be approximated by a (finite-dimensional) matrix K.
The approximation is computed as follows: From the time-dependent input features χ(t), we compute the mean μ0 (μ1) from all data excluding the last (first) τ steps of every trajectory as follows:
μ0:=1T−τT−τ∑t=0χ(t)μ1:=1T−τT∑t=τχ(t)Next, we compute the instantaneous covariance matrices C00 and C11 and the time-lagged covariance matrix C01 as follows:
C00:=1T−τT−τ∑t=0[χ(t)−μ0][χ(t)−μ0]C11:=1T−τT∑t=τ[χ(t)−μ1][χ(t)−μ1]C01:=1T−τT−τ∑t=0[χ(t)−μ0][χ(t+τ)−μ1]The Koopman matrix is then computed as follows:
K=C−100C01It can be shown 1 that the leading singular functions of the half-weighted Koopman matrix
ˉK:=C−1200C01C−1211encode the best reduced dynamical model for the time series.
The singular functions can be computed by first performing the singular value decomposition
ˉK=U′SV′and then mapping the input conformation to the left singular functions ψ and right singular functions ϕ as follows:
ψ(t):=U′⊤C−1200[χ(t)−μ0]ϕ(t):=V′⊤C−1211[χ(t)−μ1]References
- 1(1,2)
Wu, H. and Noe, F. 2017. Variational approach for learning Markov processes from time series data. arXiv:1707.04659v1
- 2
Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
- 3
Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.