pyemma.coordinates.transform.VAMP¶

class pyemma.coordinates.transform.VAMP(*args, **kwargs)¶

Variational approach for Markov processes (VAMP)

__init__(lag, dim=None, scaling=None, right=False, epsilon=1e-06, stride=1, skip=0, ncov_max=inf)¶

Variational approach for Markov processes (VAMP) 1.

Parameters

lag (int) – lag time
dim (float or int, default=None) –
Number of dimensions to keep:
- if dim is not set (None) all available ranks are kept:
  n_components == min(n_samples, n_uncorrelated_features)
- if dim is an integer >= 1, this number specifies the number of dimensions to keep.
- if dim is a float with 0 < dim < 1, select the number of dimensions such that the amount of kinetic variance that needs to be explained is greater than the percentage specified by dim.
scaling (None or string) –
Scaling to be applied to the VAMP order parameters upon transformation
- None: no scaling will be applied, variance of the order parameters is 1
- ’kinetic map’ or ‘km’: order parameters are scaled by singular value. Only the left singular functions induce a kinetic map wrt the conventional forward propagator. The right singular functions induce a kinetic map wrt the backward propagator.
right (boolean) – Whether to compute the right singular functions. If right==True, get_output() will return the right singular functions. Otherwise, get_output() will return the left singular functions. Beware that only frames[tau:, :] of each trajectory returned by get_output() contain valid values of the right singular functions. Conversely, only frames[0:-tau, :] of each trajectory returned by get_output() contain valid values of the left singular functions. The remaining frames might possibly be interpreted as some extrapolation.
epsilon (float) – eigenvalue cutoff. Eigenvalues of $C_{00}$ and $C_{11}$ with norms <= epsilon will be cut off. The remaining number of eigenvalues together with the value of dim define the size of the output.
stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.
skip (int, default=0) – skip the first initial n frames per trajectory.
ncov_max (int, default=infinity) – limit the memory usage of the algorithm from 3 to an amount that corresponds to ncov_max additional copies of each correlation matrix

Notes

VAMP is a method for dimensionality reduction of Markov processes.

The Koopman operator $\mathcal{K}$ is an integral operator that describes conditional future expectation values. Let $p(\mathbf{x},\,\mathbf{y})$ be the conditional probability density of visiting an infinitesimal phase space volume around point $\mathbf{y}$ at time $t+\tau$ given that the phase space point $\mathbf{x}$ was visited at the earlier time $t$ . Then the action of the Koopman operator on a function $f$ can be written as follows:

$\mathcal{K}f=\int p(\mathbf{x},\,\mathbf{y})f(\mathbf{y})\,\mathrm{dy}=\mathbb{E}\left[f(\mathbf{x}_{t+\tau}\mid\mathbf{x}_{t}=\mathbf{x})\right]$

The Koopman operator is defined without any reference to an equilibrium distribution. Therefore it is well-defined in situations where the dynamics is irreversible or/and non-stationary such that no equilibrium distribution exists.

If we approximate $f$ by a linear superposition of ansatz functions $\boldsymbol{\chi}$ of the conformational degrees of freedom (features), the operator $\mathcal{K}$ can be approximated by a (finite-dimensional) matrix $\mathbf{K}$ .

The approximation is computed as follows: From the time-dependent input features $\boldsymbol{\chi}(t)$ , we compute the mean $\boldsymbol{\mu}_{0}$ ( $\boldsymbol{\mu}_{1}$ ) from all data excluding the last (first) $\tau$ steps of every trajectory as follows:

$\begin{align}\begin{aligned}\boldsymbol{\mu}_{0} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\boldsymbol{\chi}(t)\\\boldsymbol{\mu}_{1} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\boldsymbol{\chi}(t)\end{aligned}\end{align}$

Next, we compute the instantaneous covariance matrices $\mathbf{C}_{00}$ and $\mathbf{C}_{11}$ and the time-lagged covariance matrix $\mathbf{C}_{01}$ as follows:

$\begin{align}\begin{aligned}\mathbf{C}_{00} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\mathbf{C}_{11} :=\frac{1}{T-\tau}\sum_{t=\tau}^{T}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\\\mathbf{C}_{01} :=\frac{1}{T-\tau}\sum_{t=0}^{T-\tau}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\left[\boldsymbol{\chi}(t+\tau)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align}$

The Koopman matrix is then computed as follows:

$\mathbf{K}=\mathbf{C}_{00}^{-1}\mathbf{C}_{01}$

It can be shown 1 that the leading singular functions of the half-weighted Koopman matrix

$\bar{\mathbf{K}}:=\mathbf{C}_{00}^{-\frac{1}{2}}\mathbf{C}_{01}\mathbf{C}_{11}^{-\frac{1}{2}}$

encode the best reduced dynamical model for the time series.

The singular functions can be computed by first performing the singular value decomposition

$\bar{\mathbf{K}}=\mathbf{U}^{\prime}\mathbf{S}\mathbf{V}^{\prime}$

and then mapping the input conformation to the left singular functions $\boldsymbol{\psi}$ and right singular functions $\boldsymbol{\phi}$ as follows:

$\begin{align}\begin{aligned}\boldsymbol{\psi}(t):=\mathbf{U}^{\prime\top}\mathbf{C}_{00}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{0}\right]\\\boldsymbol{\phi}(t):=\mathbf{V}^{\prime\top}\mathbf{C}_{11}^{-\frac{1}{2}}\left[\boldsymbol{\chi}(t)-\boldsymbol{\mu}_{1}\right]\end{aligned}\end{align}$

References

1(1,2,3): Wu, H. and Noe, F. 2017. Variational approach for learning Markov processes from time series data. arXiv:1707.04659v1
2: Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
3: Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.

Methods

`_Loggable__create_logger`()
`_SerializableMixIn__interpolate`(state, klass)
`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__getstate__`()
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`(lag[, dim, scaling, right, …])	Variational approach for Markov processes (VAMP) 1.
`__init_subclass__`(args, *kwargs)	This method is called when a class is subclassed.
`__iter__`()
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__my_getstate__`()
`__my_setstate__`(state)
`__ne__`(value, /)	Return self!=value.
`__new__`(cls, args, *kwargs)	Create and return a new object.
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__setstate__`(state)
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`_check_estimated`()
`_chunk_finite`(data)
`_cleanup_logger`(logger_id, logger_name)
`_clear_in_memory`()
`_compute_default_cs`(dim, itemsize[, logger])
`_create_iterator`([skip, chunk, stride, …])	Should be implemented by non-abstract subclasses.
`_data_flow_chain`()	Get a list of all elements in the data flow graph.
`_estimate`(iterable, **kw)
`_get_classes_to_inspect`()	gets classes self derives from which 1.
`_get_interpolation_map`(cls)
`_get_param_names`()	Get parameter names for the estimator
`_get_private_field`(cls, name[, default])
`_get_serialize_fields`(cls)
`_get_state_of_serializeable_fields`(klass, state)	:return a dictionary {k:v} for k in self.serialize_fields and v=getattr(self, k)
`_get_traj_info`(filename)
`_get_version`(cls[, require])
`_get_version_for_class_from_state`(state, klass)	retrieves the version of the current klass from the state mapping from old locations to new ones.
`_init_covar`([partial])
`_logger_is_active`(level)	@param level: int log level (debug=10, info=20, warn=30, error=40, critical=50)
`_map_to_memory`([stride])	Maps results to memory.
`_set_random_access_strategies`()
`_set_state_from_serializeable_fields_and_state`(…)	set only fields from state, which are present in klass.__serialize_fields
`_source_from_memory`([data_producer])
`_transform_array`(X)	Projects the data onto the dominant singular functions.
`cktest`([n_observables, observables, …])	Do the Chapman-Kolmogorov test by computing predictions for higher lag times and by performing estimations at higher lag times.
`describe`()	Get a descriptive string representation of this class.
`dimension`()	real output dimension after low-rank approximation.
`estimate`(X, **kwargs)	Estimates the model given the data X
`expectation`(observables, statistics[, …])	Compute future expectation of observable or covariance using the approximated Koopman operator.
`fit`(X[, y])	Estimates parameters - for compatibility with sklearn.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_output`([dimensions, stride, skip, chunk])	Maps all input data of this transformer and returns it as an array or list of arrays
`get_params`([deep])	Get parameters for this estimator.
`iterator`([stride, lag, chunk, …])	creates an iterator to stream over the (transformed) data.
`load`(file_name[, model_name])	Loads a previously saved PyEMMA object from disk.
`n_chunks`(chunksize[, stride, skip])	how many chunks an iterator of this sourcde will output, starting (eg.
`n_frames_total`([stride, skip])	Returns total number of frames.
`number_of_trajectories`([stride])	Returns the number of trajectories.
`output_type`()	By default transformers return single precision floats.
`partial_fit`(X)	incrementally update the covariances and mean.
`save`(file_name[, model_name, overwrite, …])	saves the current state of this object to given file and name.
`score`([test_data, score_method])	Compute the VAMP score for this model or the cross-validation score between self and a second model estimated form different data.
`set_params`(**params)	Set the parameters of this estimator.
`trajectory_length`(itraj[, stride, skip])	Returns the length of trajectory of the requested index.
`trajectory_lengths`([stride, skip])	Returns the length of each trajectory.
`transform`(X)	Maps the input data through the transformer to correspondingly shaped output data array/list.
`write_to_csv`([filename, extension, …])	write all data to csv with numpy.savetxt
`write_to_hdf5`(filename[, group, …])	writes all data of this Iterable to a given HDF5 file.

Attributes

`_DataSource__serialize_fields`
`_Estimator__serialize_fields`
`_FALLBACK_CHUNKSIZE`
`_InMemoryMixin__serialize_fields`
`_InMemoryMixin__serialize_version`
`_Loggable__ids`
`_Loggable__refs`
`_SerializableMixIn__serialize_fields`
`_SerializableMixIn__serialize_modifications_map`
`_SerializableMixIn__serialize_version`
`_VAMP__serialize_fields`
`_VAMP__serialize_version`
`__abstractmethods__`
`__dict__`
`__doc__`
`__module__`
`__weakref__`	list of weak references to the object (if defined)
`_abc_impl`
`_estimated`
`_loglevel_CRITICAL`
`_loglevel_DEBUG`
`_loglevel_ERROR`
`_loglevel_INFO`
`_loglevel_WARN`
`_save_data_producer`
`_serialize_version`
`chunksize`	chunksize defines how much data is being processed at once.
`cumvar`	Cumulative sum of the squared and normalized singular values
`data_producer`	The data producer for this data source object (can be another data source object).
`default_chunksize`	How much data will be processed at once, in case no chunksize has been provided.
`dim`	Number of dimensions to keep
`epsilon`	singular value cutoff.
`filenames`	list of file names the data is originally being read from.
`in_memory`	are results stored in memory?
`is_random_accessible`	Check if self._is_random_accessible is set to true and if all the random access strategies are implemented.
`is_reader`	Property telling if this data source is a reader or not.
`logger`	The logger for this class instance
`model`	The model estimated by this Estimator
`name`	The name of this instance
`ndim`
`ntraj`
`ra_itraj_cuboid`	Implementation of random access with slicing that can be up to 3-dimensional, where the first dimension corresponds to the trajectory index, the second dimension corresponds to the frames and the third dimension corresponds to the dimensions of the frames.
`ra_itraj_jagged`	Behaves like ra_itraj_cuboid just that the trajectories are not truncated and returned as a list.
`ra_itraj_linear`	Implementation of random access that takes arguments as the default random access (i.e., up to three dimensions with trajs, frames and dims, respectively), but which considers the frame indexing to be contiguous.
`ra_linear`	Implementation of random access that takes a (maximal) two-dimensional slice where the first component corresponds to the frames and the second component corresponds to the dimensions.
`scaling`	Scaling to be applied to the VAMP order parameters upon transformation
`show_progress`
`singular_values`	Singular values of the half-weighted Koopman matrix (usually denoted $\sigma$ )
`singular_vectors_left`	Transformation matrix that represents the linear map from feature space to the space of left singular functions.
`singular_vectors_right`	Transformation matrix that represents the linear map from feature space to the space of right singular functions.