pyemma.coordinates.covariance_lagged

pyemma.coordinates.covariance_lagged(data=None, c00=True, c0t=True, ctt=False, remove_constant_mean=None, remove_data_mean=False, reversible=False, bessel=True, lag=0, weights='empirical', stride=1, skip=0, chunksize=None, ncov_max=inf, column_selection=None, diag_only=False)

Compute lagged covariances between time series. If data is available as an array of size (TxN), where T is the number of time steps and N the number of dimensions, this function can compute lagged covariances like

\[\begin{split}C_00 &= X^T X \\ C_{0t} &= X^T Y \\ C_{tt} &= Y^T Y,\end{split}\]

where X comprises the first T-lag time steps and Y the last T-lag time steps. It is also possible to use more than one time series, the number of time steps in each time series can also vary.

Parameters
  • data (ndarray (T, d) or list of ndarray (T_i, d) or a reader created by) – source function array with the data, if available. When given, the covariances are immediately computed.

  • c00 (bool, optional, default=True) – compute instantaneous correlations over the first part of the data. If lag==0, use all of the data.

  • c0t (bool, optional, default=False) – compute lagged correlations. Does not work with lag==0.

  • ctt (bool, optional, default=False) – compute instantaneous correlations over the second part of the data. Does not work with lag==0.

  • remove_constant_mean (ndarray(N,), optional, default=None) – substract a constant vector of mean values from time series.

  • remove_data_mean (bool, optional, default=False) – substract the sample mean from the time series (mean-free correlations).

  • reversible (bool, optional, default=False) – symmetrize correlations.

  • bessel (bool, optional, default=True) – use Bessel’s correction for correlations in order to use an unbiased estimator

  • lag (int, optional, default=0) – lag time. Does not work with xy=True or yy=True.

  • weights (optional, default="empirical") –

    Re-weighting strategy to be used in order to compute equilibrium covariances from non-equilibrium data.
    • ”empirical”: no re-weighting

    • ”koopman”: use re-weighting procedure from 1

    • weights: An object that allows to compute re-weighting factors. It must possess a method

      weights(X) that accepts a trajectory X (np.ndarray(T, n)) and returns a vector of re-weighting factors (np.ndarray(T,)).

  • stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.

  • skip (int, optional, default=0) – skip the first initial n frames per trajectory.

  • chunksize (int, default=None) – Number of data frames to process at once. Choose a higher value here, to optimize thread usage and gain processing speed. If None is passed, use the default value of the underlying reader/data source. Choose zero to disable chunking at all.

  • ncov_max (int, default=infinity) – limit the memory usage of the algorithm from 2 to an amount that corresponds to ncov_max additional copies of each correlation matrix

  • column_selection (ndarray(k, dtype=int) or None) – Indices of those columns that are to be computed. If None, all columns are computed.

  • diag_only (bool) – If True, the computation is restricted to the diagonal entries (autocorrelations) only.

Returns

lc

Return type

a LaggedCovariance object.

1

Wu, H., Nueske, F., Paul, F., Klus, S., Koltai, P., and Noe, F. 2016. Bias reduced variational approximation of molecular kinetics from short off-equilibrium simulations. J. Chem. Phys. (submitted)

2

Chan, T. F., Golub G. H., LeVeque R. J. 1979. Updating formulae and pairwiese algorithms for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University.