pyemma.msm.bayesian_markov_model

pyemma.msm.bayesian_markov_model(dtrajs, lag, reversible=True, statdist=None, sparse=False, connectivity='largest', count_mode='effective', nsamples=100, conf=0.95, dt_traj='1 step', show_progress=True, mincount_connectivity='1/n', core_set=None, milestoning_method='last_core')

Bayesian Markov model estimate using Gibbs sampling of the posterior

Returns a BayesianMSM that contains the estimated transition matrix and allows to compute a large number of quantities related to Markov models as well as their statistical uncertainties.

Parameters
  • dtrajs (list containing ndarrays(dtype=int) or ndarray(n, dtype=int)) – discrete trajectories, stored as integer ndarrays (arbitrary size) or a single ndarray for only one trajectory.

  • lag (int) – lagtime for the MSM estimation in multiples of trajectory steps

  • reversible (bool, optional, default = True) – If true compute reversible MSM, else non-reversible MSM

  • sparse (bool, optional, default = False) – If true compute count matrix, transition matrix and all derived quantities using sparse matrix algebra. In this case python sparse matrices will be returned by the corresponding functions instead of numpy arrays. This behavior is suggested for very large numbers of states (e.g. > 4000) because it is likely to be much more efficient.

  • statdist ((M,) ndarray, optional) – Stationary vector on the full state-space. Transition matrix will be estimated such that statdist is its equilibrium distribution.

  • count_mode (str, optional, default='sliding') –

    mode to obtain count matrices from discrete trajectories. Should be one of:

    • ’sliding’ : A trajectory of length T will have \(T-tau\) counts at time indexes

      \[(0 \rightarrow \tau), (1 \rightarrow \tau+1), ..., (T-\tau-1 \rightarrow T-1)\]
    • ’effective’ : Uses an estimate of the transition counts that are statistically uncorrelated. Recommended when used with a Bayesian MSM.

    • ’sample’ : A trajectory of length T will have \(T/tau\) counts at time indexes

      \[(0 \rightarrow \tau), (\tau \rightarrow 2 \tau), ..., (((T/tau)-1) \tau \rightarrow T)\]

  • connectivity (str, optional, default = None) –

    Defines if the resulting HMM will be defined on all hidden states or on a connected subset. Connectivity is defined by counting only transitions with at least mincount_connectivity counts. If a subset of states is used, all estimated quantities (transition matrix, stationary distribution, etc) are only defined on this subset and are correspondingly smaller than nstates. Following modes are available: * None or ‘all’ : The active set is the full set of states.

    Estimation is done on all weakly connected subsets separately. The resulting transition matrix may be disconnected.

    • ’largest’ : The active set is the largest reversibly connected set.

    • ’populous’The active set is the reversibly connected set with

      most counts.

  • nsample (int, optional, default=100) – number of transition matrix samples to compute and store

  • conf (float, optional, default=0.95) – size of confidence intervals

  • dt_traj (str, optional, default='1 step') –

    Description of the physical time corresponding to the trajectory time step. May be used by analysis algorithms such as plotting tools to pretty-print the axes. By default ‘1 step’, i.e. there is no physical time unit. Specify by a number, whitespace and unit. Permitted units are (* is an arbitrary string):

    ’fs’, ‘femtosecond*’
    ’ps’, ‘picosecond*’
    ’ns’, ‘nanosecond*’
    ’us’, ‘microsecond*’
    ’ms’, ‘millisecond*’
    ’s’, ‘second*’

  • show_progress (bool, default=True) – Show progressbars for calculation

  • mincount_connectivity (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. The default evaluates to 1/nstates.

  • core_set (None (default) or array like, dtype=int) – Definition of core set for milestoning MSMs. If set to None, replaces state -1 (if found in discrete trajectories) and performs milestone counting. No effect for Voronoi-discretized trajectories (default). If a list or np.ndarray is supplied, discrete trajectories will be assigned accordingly.

  • milestoning_method (str) – Method to use for counting transitions in trajectories with unassigned frames. Currently available: | ‘last_core’, assigns unassigned frames to last visited core

Returns

  • An BayesianMSM object containing the Bayesian MSM estimator

  • and the model.

Example

Note that the following example is only qualitatively and not quantitatively reproducible because it involves random numbers.

We build a Bayesian Markov model for the following two trajectories at lag time 2:

>>> from pyemma import msm
>>> dtrajs = [[0,1,2,2,2,2,1,2,2,2,1,0,0,0,0,0,0,0], [0,0,0,0,1,1,2,2,2,2,2,2,2,1,0,0]]
>>> mm = msm.bayesian_markov_model(dtrajs, 2, show_progress=False)

The resulting Model is an MSM just like you get with estimate_markov_model Its transition matrix does also come from a maximum likelihood estimation, but it’s slightly different from the estimate_markov_mode result because bayesian_markov_model uses an effective count matrix with statistically uncorrelated counts:

>>> print(mm.transition_matrix)  
[[ 0.70000001  0.16463699  0.135363  ]
 [ 0.38169055  0.          0.61830945]
 [ 0.12023989  0.23690297  0.64285714]]

However bayesian_markov_model returns a SampledMSM object which is able to compute the probability distribution and statistical models of all methods that are offered by the MSM object. This works as follows. You can ask for the sample mean and specify the method you wanna evaluate as a string:

>>> print(mm.sample_mean('transition_matrix'))  
[[ 0.71108663  0.15947371  0.12943966]
 [ 0.41076105  0.          0.58923895]
 [ 0.13079372  0.23005443  0.63915185]]

Likewise, the standard deviation by element:

>>> print(mm.sample_std('transition_matrix'))  
[[ 0.13707029  0.09479627  0.09200214]
 [ 0.15247454  0.          0.15247454]
 [ 0.07701315  0.09385258  0.1119089 ]]

And this is the 95% (2 sigma) confidence interval. You can control the percentile using the conf argument in this function:

>>> L, R = mm.sample_conf('transition_matrix')
>>> print(L) 
>>> print(R)  
[[ 0.44083423  0.03926518  0.0242113 ]
 [ 0.14102544  0.          0.30729828]
 [ 0.02440188  0.07629456  0.43682481]]
[[ 0.93571706  0.37522581  0.40180041]
 [ 0.69307665  0.          0.8649215 ]
 [ 0.31029752  0.44035732  0.85994006]]

If you wanna compute expectations of functions that require arguments, just pass these arguments as well:

>>> print(mm.sample_std('mfpt', 0, 2)) 
12.9049811296

And if you want to histogram the distribution or compute more complex statistical moment such as the covariance between different quantities, just get the full sample of your quantity of interest and evaluate it at will:

>>> samples = mm.sample_f('mfpt', 0, 2)
>>> print(samples[:4]) 
[7.9763615793248155, 8.6540958274695701, 26.295326015231058, 17.909895469938899]

Internally, the SampledMSM object has 100 transition matrices (the number can be controlled by nsamples), that were computed by the transition matrix sampling method. All of the above sample functions iterate over these 100 transition matrices and evaluate the requested function with the given parameters on each of them.

class pyemma.msm.estimators.bayesian_msm.BayesianMSM(*args, **kwargs)

Bayesian Markov state model estimator

Methods

cktest(nsets[, memberships, mlags, conf, …])

Conducts a Chapman-Kolmogorow test.

coarse_grain(ncoarse[, method])

Returns a coarse-grained Markov model.

committor_backward(A, B)

Backward committor from set A to set B

committor_forward(A, B)

Forward committor (also known as p_fold or splitting probability) from set A to set B

correlation(a[, b, maxtime, k, ncv])

Time-correlation for equilibrium experiment.

eigenvalues([k])

Compute the transition matrix eigenvalues

eigenvectors_left([k])

Compute the left transition matrix eigenvectors

eigenvectors_right([k])

Compute the right transition matrix eigenvectors

estimate(dtrajs, **kw)

param dtrajs

discrete trajectories, stored as integer ndarrays (arbitrary size)

expectation(a)

Equilibrium expectation value of a given observable.

fingerprint_correlation(a[, b, k, ncv])

Dynamical fingerprint for equilibrium time-correlation experiment.

fingerprint_relaxation(p0, a[, k, ncv])

Dynamical fingerprint for perturbation/relaxation experiment.

fit(X[, y])

Estimates parameters - for compatibility with sklearn.

generate_traj(N[, start, stop, stride])

Generates a synthetic discrete trajectory of length N and simulation time stride * lag time * N

get_model_params([deep])

Get parameters for this model.

get_params([deep])

Get parameters for this estimator.

hmm(nhidden)

Estimates a hidden Markov state model as described in 1

load(file_name[, model_name])

Loads a previously saved PyEMMA object from disk.

mfpt(A, B)

Mean first passage times from set A to set B, in units of the input trajectory time step

pcca(m)

Runs PCCA++ 1 to compute a metastable decomposition of MSM states

propagate(p0, k)

Propagates the initial distribution p0 k times

relaxation(p0, a[, maxtime, k, ncv])

Simulates a perturbation-relaxation experiment.

sample_by_distributions(distributions, nsample)

Generates samples according to given probability distributions

sample_by_state(nsample[, subset, replace])

Generates samples of the connected states.

sample_conf(f, *args, **kwargs)

Sample confidence interval of numerical method f over all samples

sample_f(f, *args, **kwargs)

Evaluated method f for all samples

sample_mean(f, *args, **kwargs)

Sample mean of numerical method f over all samples

sample_std(f, *args, **kwargs)

Sample standard deviation of numerical method f over all samples

save(file_name[, model_name, overwrite, …])

saves the current state of this object to given file and name.

score(dtrajs[, score_method, score_k])

Scores the MSM using the dtrajs using the variational approach for Markov processes 1 [2]_

score_cv(dtrajs[, n, score_method, score_k])

Scores the MSM using the variational approach for Markov processes 1 [2]_ and crossvalidation [3]_ .

set_model_params([samples, conf, P, pi, …])

param samples

sampled MSMs

set_params(**params)

Set the parameters of this estimator.

simulate(N[, start, stop, dt])

Generates a realization of the Markov Model

timescales([k])

The relaxation timescales corresponding to the eigenvalues

trajectory_weights()

Uses the MSM to assign a probability weight to each trajectory frame.

update_model_params(**params)

Update given model parameter if they are set to specific values

Attributes

estimate(dtrajs, **kw)
Parameters
  • dtrajs (list containing ndarrays(dtype=int) or ndarray(n, dtype=int)) – discrete trajectories, stored as integer ndarrays (arbitrary size) or a single ndarray for only one trajectory.

  • kw (dict) – Other Parameters. See documentation of class.

Returns

msm – Estimated Hidden Markov state model

Return type

BayesianMSM

References

1(1,2,3,4)

Trendelkamp-Schroer, B, H. Wu, F. Paul and F. Noe: Estimation and uncertainty of reversible Markov models. http://arxiv.org/abs/1507.05990