pyemma.msm.estimate_markov_model

pyemma.msm.estimate_markov_model(dtrajs, lag, reversible=True, sparse=False, connectivity='largest', estimate=True, dt='1 step', **kwargs)

Estimates a Markov model from discrete trajectories

Returns a EstimatedMSM that contains the estimated transition matrix and allows to compute a large number of quantities related to Markov models.

Parameters:
  • dtrajs (list containing ndarrays(dtype=int) or ndarray(n, dtype=int)) – discrete trajectories, stored as integer ndarrays (arbitrary size) or a single ndarray for only one trajectory.
  • lag (int) – lagtime for the MSM estimation in multiples of trajectory steps
  • reversible (bool, optional, default = True) – If true compute reversible MSM, else non-reversible MSM
  • sparse (bool, optional, default = False) – If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra. In this case python sparse matrices will be returned by the corresponding functions instead of numpy arrays. This behavior is suggested for very large numbers of states (e.g. > 4000) because it is likely to be much more efficient.
  • connectivity (str, optional, default = 'largest') –

    Connectivity mode. Three methods are intended (currently only ‘largest’ is implemented) ‘largest’ : The active set is the largest reversibly connected set. All estimation will be done on this

    subset and all quantities (transition matrix, stationary distribution, etc) are only defined on this subset and are correspondingly smaller than the full set of states
    ‘all’ : The active set is the full set of states. Estimation will be conducted on each reversibly connected
    set separately. That means the transition matrix will decompose into disconnected submatrices, the stationary vector is only defined within subsets, etc. Currently not implemented.
    ‘none’ : The active set is the full set of states. Estimation will be conducted on the full set of states
    without ensuring connectivity. This only permits nonreversible estimation. Currently not implemented.
  • estimate (bool, optional, default=True) – If true estimate the MSM when creating the MSM object.
  • dt (str, optional, default='1 step') –

    Description of the physical time corresponding to the lag. May be used by analysis algorithms such as plotting tools to pretty-print the axes. By default ‘1 step’, i.e. there is no physical time unit. Specify by a number, whitespace and unit. Permitted units are (* is an arbitrary string):

    ‘fs’, ‘femtosecond*’
    ‘ps’, ‘picosecond*’
    ‘ns’, ‘nanosecond*’
    ‘us’, ‘microsecond*’
    ‘ms’, ‘millisecond*’
    ‘s’, ‘second*’
  • **kwargs
  • = 1000000 (maxiter) – Optional parameter with reversible = True. maximum number of iterations before the transition matrix estimation method exits
  • = 1e-8 (maxerr) – Optional parameter with reversible = True. convergence tolerance for transition matrix estimation. This specifies the maximum change of the Euclidean norm of relative stationary probabilities (xi=kxik). The relative stationary probability changes ei=(x(1)ix(2)i)/(x(1)i+x(2)i) are used in order to track changes in small probabilities. The Euclidean norm of the change vector, |ei|2, is compared to maxerr.
Returns:

  • An EstimatedMSM object containing a transition matrix and various other
  • MSM-related quantities.

Notes

You can postpone the estimation of the MSM using compute=False and initiate the estimation procedure by manually calling the MSM.estimate() method.

See also

EstimatedMSM()
An MSM object that has been estimated from data