pyemma.msm.its

pyemma.msm.its(dtrajs, lags=None, nits=None, reversible=True, connected=True, errors=None, nsamples=50, n_jobs=1, show_progress=True)

Implied timescales from Markov state models estimated at a series of lag times.

Parameters:
  • dtrajs (array-like or list of array-likes) – discrete trajectories
  • lags (array-like of integers, optional) – integer lag times at which the implied timescales will be calculated
  • nits (int, optional) – number of implied timescales to be computed. Will compute less if the number of states are smaller. If None, the number of timescales will be automatically determined.
  • connected (boolean, optional) – If true compute the connected set before transition matrix estimation at each lag separately
  • reversible (boolean, optional) – Estimate transition matrix reversibly (True) or nonreversibly (False)
  • errors (None | 'bayes', optional) –

    Specifies whether to compute statistical uncertainties (by default not), an which algorithm to use if yes. Currently the only option is:

    • ‘bayes’ for Bayesian sampling of the posterior

    Attention: Computing errors can be very slow if the MSM has many states. Moreover there are still unsolved theoretical problems, and therefore the uncertainty interval and the maximum likelihood estimator can be inconsistent. Use this as a rough guess for statistical uncertainties.

  • nsamples (int, optional) – The number of approximately independent transition matrix samples generated for each lag time for uncertainty quantification. Only used if errors is not None.
  • n_jobs (int, optional) – how many subprocesses to start to estimate the models for each lag time.
Returns:

itsobj

Return type:

ImpliedTimescales object

Example

>>> from pyemma import msm
>>> dtraj = [0,1,1,2,2,2,1,2,2,2,1,0,0,1,1,1,2,2,1,1,2,1,1,0,0,0,1,1,2,2,1]   # mini-trajectory
>>> ts = msm.its(dtraj, [1,2,3,4,5], show_progress=False)
>>> print(ts.timescales)  
[[ 1.5...  0.2...]
 [ 3.1...  1.0...]
 [ 2.03...  1.02...]
 [ 4.63...  3.42...]
 [ 5.13...  2.59...]]

See also

ImpliedTimescales()
The object returned by this function.
pyemma.plots.plot_implied_timescales()
Implied timescales plotting function. Just call it with the ImpliedTimescales object produced by this function as an argument.
class pyemma.msm.estimators.implied_timescales.ImpliedTimescales(estimator, lags=None, nits=None, n_jobs=1, show_progress=True)

Implied timescales for a series of lag times.

Methods

estimate(X, **params) Estimates the model given the data X
fit(X) Estimates parameters - for compatibility with sklearn.
get_params([deep]) Get parameters for this estimator.
get_sample_conf([conf, process]) Returns the confidence interval that contains alpha % of the sample data
get_sample_mean([process]) Returns the sample means of implied timescales.
get_sample_std([process]) Returns the standard error of implied timescales.
get_timescales([process]) Returns the implied timescale estimates
register_progress_callback(call_back[, stage]) Registers the progress reporter.
set_params(**params) Set the parameters of this estimator.

Attributes

estimators Returns the estimators for all lagtimes .
fraction_of_frames Returns the fraction of frames used to compute the count matrix at each lagtime.
lags Return the list of lag times for which timescales were computed.
lagtimes Return the list of lag times for which timescales were computed.
logger The logger for this class instance
model The model estimated by this Estimator
models Returns the models for all lagtimes .
name The name of this instance
number_of_timescales Return the number of timescales.
sample_mean Returns the sample means of implied timescales.
sample_std Returns the standard error of implied timescales.
samples_available Returns True if samples are available and thus sample
show_progress whether to show the progress of heavy calculations on this object.
timescales Returns the implied timescale estimates
estimate(X, **params)

Estimates the model given the data X

Parameters:
  • X (object) – A reference to the data from which the model will be estimated
  • params (dict) – New estimation parameter values. The parameters must that have been announced in the __init__ method of this estimator. The present settings will overwrite the settings of parameters given in the __init__ method, i.e. the parameter values after this call will be those that have been used for this estimation. Use this option if only one or a few parameters change with respect to the __init__ settings for this run, and if you don’t need to remember the original settings of these changed parameters.
Returns:

estimator – The estimated estimator with the model being available.

Return type:

object

estimators

Returns the estimators for all lagtimes .

fit(X)

Estimates parameters - for compatibility with sklearn.

Parameters:X (object) – A reference to the data from which the model will be estimated
Returns:estimator – The estimator (self) with estimated model.
Return type:object
fraction_of_frames

Returns the fraction of frames used to compute the count matrix at each lagtime.

Notes

In a list of discrete trajectories with varying lengths, the estimation at longer lagtimes will mean discarding some trajectories for which not even one count can be computed. This function returns the fraction of frames that was actually used in computing the count matrix.

Be aware: this fraction refers to the full count matrix, and not that of the largest connected set. Hence, the output is not necessarily the active fraction. For that, use the activte_count_fraction function of the pyemma.msm.MaximumLikelihoodMSM class object or for HMM respectively.

get_params(deep=True)

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
get_sample_conf(conf=0.95, process=None)

Returns the confidence interval that contains alpha % of the sample data

etc.

Parameters:conf (float, default = 0.95) –

the confidence interval. Use:

  • conf = 0.6827 for 1-sigma confidence interval
  • conf = 0.9545 for 2-sigma confidence interval
  • conf = 0.9973 for 3-sigma confidence interval
Returns:(L,R) – lower and upper timescales bounding the confidence interval
  • if process is None, will return two (l x k) arrays, where l is the number of lag times and k is the number of computed timescales.
  • if process is an integer, will return two (l)-arrays with the selected process time scale for every lag time
Return type:(float[],float[]) or (float[][],float[][])
get_sample_mean(process=None)

Returns the sample means of implied timescales. Need to generate the samples first, e.g. by calling bootstrap

Parameters:process (int or None, default = None) – index in [0:n-1] referring to the process whose timescale will be returned. By default, process = None and all computed process timescales will be returned.
Returns:
  • if process is None, will return a (l x k) array, where l is the number of lag times
  • and k is the number of computed timescales.
  • if process is an integer, will return a (l) array with the selected process time scale
  • for every lag time
get_sample_std(process=None)

Returns the standard error of implied timescales. Need to generate the samples first, e.g. by calling bootstrap

Parameters:process (int or None, default = None) – index in [0:n-1] referring to the process whose timescale will be returned. By default, process = None and all computed process timescales will be returned.
Returns:
  • if process is None, will return a (l x k) array, where l is the number of lag times
  • and k is the number of computed timescales.
  • if process is an integer, will return a (l) array with the selected process time scale
  • for every lag time
get_timescales(process=None)

Returns the implied timescale estimates

Parameters:process (int or None, default = None) – index in [0:n-1] referring to the process whose timescale will be returned. By default, process = None and all computed process timescales will be returned.
Returns:
  • if process is None, will return a (l x k) array, where l is the number of lag times
  • and k is the number of computed timescales.
  • if process is an integer, will return a (l) array with the selected process time scale
  • for every lag time
lags

Return the list of lag times for which timescales were computed.

lagtimes

Return the list of lag times for which timescales were computed.

logger

The logger for this class instance

model

The model estimated by this Estimator

models

Returns the models for all lagtimes .

name

The name of this instance

number_of_timescales

Return the number of timescales.

register_progress_callback(call_back, stage=0)

Registers the progress reporter.

Parameters:
  • call_back (function) –

    This function will be called with the following arguments:

    1. stage (int)
    2. instance of pyemma.utils.progressbar.ProgressBar
    3. optional *args and named keywords (**kw), for future changes
  • stage (int, optional, default=0) – The stage you want the given call back function to be fired.
sample_mean

Returns the sample means of implied timescales. Need to generate the samples first, e.g. by calling bootstrap

Returns:timescales – mean timescales for all processes and lag times. l is the number of lag times and k is the number of computed timescales.
Return type:ndarray((l x k), dtype=float)
sample_std

Returns the standard error of implied timescales. Need to generate the samples first, e.g. by calling bootstrap

Returns:timescales – standard deviations of timescales for all processes and lag times. l is the number of lag times and k is the number of computed timescales.
Return type:ndarray((l x k), dtype=float)
samples_available

Returns True if samples are available and thus sample means, standard errors and confidence intervals can be obtained

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The former have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. :returns: :rtype: self

show_progress

whether to show the progress of heavy calculations on this object.

timescales

Returns the implied timescale estimates

Returns:timescales – timescales for all processes and lag times. l is the number of lag times and k is the number of computed timescales.
Return type:ndarray((l x k), dtype=float)

References

Implied timescales as a lagtime-selection and MSM-validation approach were suggested in [1]. Error estimation is done either using moving block bootstrapping [2] or a Bayesian analysis using Metropolis-Hastings Monte Carlo sampling of the posterior. Nonreversible Bayesian sampling is done by independently sampling Dirichtlet distributions of the transition matrix rows. A Monte Carlo method for sampling reversible MSMs was introduced in [3]. Here we employ a much more efficient algorithm introduced in [4].

[1]Swope, W. C. and J. W. Pitera and F. Suits: Describing protein folding kinetics by molecular dynamics simulations: 1. Theory. J. Phys. Chem. B 108: 6571-6581 (2004)
[2]Kuensch, H. R.: The jackknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217-1241 (1989)
[3]Noe, F.: Probability Distributions of Molecular Observables computed from Markov Models. J. Chem. Phys. 128, 244103 (2008)
[4]Trendelkamp-Schroer, B, H. Wu, F. Paul and F. Noe: Estimation and uncertainty of reversible Markov models. http://arxiv.org/abs/1507.05990