msmtools.estimation.transition_matrix

msmtools.estimation.transition_matrix(C, reversible=False, mu=None, method='auto', **kwargs)

Estimate the transition matrix from the given countmatrix.

Parameters:
  • C (numpy ndarray or scipy.sparse matrix) – Count matrix
  • reversible (bool (optional)) – If True restrict the ensemble of transition matrices to those having a detailed balance symmetry otherwise the likelihood optimization is carried out over the whole space of stochastic matrices.
  • mu (array_like) – The stationary distribution of the MLE transition matrix.
  • method (string (one of 'auto', 'dense' and 'sparse', optional, default='auto')) – Select which implementation to use for the estimation. ‘dense’ always selects the dense implementation, ‘sparse’ always selects the sparse one. ‘auto’ selectes the most efficient implementation according to the sparsity structure of the matrix: if the occupation of the C matrix is less then one third, select sparse. Else select dense. The type of the T matrix returned always matches the type of the C matrix, irrespective of the method that was used to compute it.
  • **kwargs (Optional algorithm-specific parameters. See below for special cases) –
  • Xinit ((M, M) ndarray) – Optional parameter with reversible = True. initial value for the matrix of absolute transition probabilities. Unless set otherwise, will use X = diag(pi) t, where T is a nonreversible transition matrix estimated from C, i.e. T_ij = c_ij / sum_k c_ik, and pi is its stationary distribution.
  • maxiter (1000000 : int) – Optional parameter with reversible = True. maximum number of iterations before the method exits
  • maxerr (1e-8 : float) – Optional parameter with reversible = True. convergence tolerance for transition matrix estimation. This specifies the maximum change of the Euclidean norm of relative stationary probabilities (\(x_i = \sum_k x_{ik}\)). The relative stationary probability changes \(e_i = (x_i^{(1)} - x_i^{(2)})/(x_i^{(1)} + x_i^{(2)})\) are used in order to track changes in small probabilities. The Euclidean norm of the change vector, \(|e_i|_2\), is compared to maxerr.
  • return_statdist (bool, default=False) – Optional parameter with reversible = True. If set to true, the stationary distribution is also returned
  • return_conv (bool, default=False) – Optional parameter with reversible = True. If set to true, the likelihood history and the pi_change history is returned.
  • warn_not_converged (bool, default=True) – Prints a warning if not converged.
Returns:

  • P ((M, M) ndarray or scipy.sparse matrix) – The MLE transition matrix. P has the same data type (dense or sparse) as the input matrix C.
  • The reversible estimator returns by default only P, but may also return
  • (P,pi) or (P,lhist,pi_changes) or (P,pi,lhist,pi_changes) depending on the return settings
  • P (ndarray (n,n)) – transition matrix. This is the only return for return_statdist = False, return_conv = False
  • (pi) (ndarray (n)) – stationary distribution. Only returned if return_statdist = True
  • (lhist) (ndarray (k)) – likelihood history. Has the length of the number of iterations needed. Only returned if return_conv = True
  • (pi_changes) (ndarray (k)) – history of likelihood history. Has the length of the number of iterations needed. Only returned if return_conv = True

Notes

The transition matrix is a maximum likelihood estimate (MLE) of the probability distribution of transition matrices with parameters given by the count matrix.

References

[1]Prinz, J H, H Wu, M Sarich, B Keller, M Senne, M Held, J D Chodera, C Schuette and F Noe. 2011. Markov models of molecular kinetics: Generation and validation. J Chem Phys 134: 174105
[2]Bowman, G R, K A Beauchamp, G Boxer and V S Pande. 2009. Progress and challenges in the automated construction of Markov state models for full protein systems. J. Chem. Phys. 131: 124101

Examples

>>> import numpy as np
>>> from msmtools.estimation import transition_matrix
>>> C = np.array([[10, 1, 1], [2, 0, 3], [0, 1, 4]])

Non-reversible estimate

>>> T_nrev = transition_matrix(C)
>>> T_nrev
array([[ 0.83333333,  0.08333333,  0.08333333],
       [ 0.4       ,  0.        ,  0.6       ],
       [ 0.        ,  0.2       ,  0.8       ]])

Reversible estimate

>>> T_rev = transition_matrix(C, reversible=True)
>>> T_rev
array([[ 0.83333333,  0.10385551,  0.06281115],
       [ 0.35074677,  0.        ,  0.64925323],
       [ 0.04925323,  0.15074677,  0.8       ]])

Reversible estimate with given stationary vector

>>> mu = np.array([0.7, 0.01, 0.29])
>>> T_mu = transition_matrix(C, reversible=True, mu=mu)
>>> T_mu
array([[ 0.94771371,  0.00612645,  0.04615984],
       [ 0.42885157,  0.        ,  0.57114843],
       [ 0.11142031,  0.01969477,  0.86888491]])