pyemma.msm.bayesian_hidden_markov_model

pyemma.msm.bayesian_hidden_markov_model(dtrajs, nstates, lag, nsamples=100, reversible=True, stationary=False, connectivity=None, mincount_connectivity='1/n', separate=None, observe_nonempty=True, stride='effective', conf=0.95, dt_traj='1 step', store_hidden=False, show_progress=True)

Bayesian Hidden Markov model estimate using Gibbs sampling of the posterior

Returns a BayesianHMSM that contains the estimated hidden Markov model 1 and a Bayesian estimate 2 that contains samples around this estimate to estimate uncertainties.

Parameters
  • dtrajs (list containing ndarrays(dtype=int) or ndarray(n, dtype=int)) – discrete trajectories, stored as integer ndarrays (arbitrary size) or a single ndarray for only one trajectory.

  • lag (int) – lagtime for the MSM estimation in multiples of trajectory steps

  • nstates (int) – the number of metastable states in the resulting HMM

  • reversible (bool, optional, default = True) – If true compute reversible MSM, else non-reversible MSM

  • stationary (bool, optional, default=False) – If True, the initial distribution of hidden states is self-consistently computed as the stationary distribution of the transition matrix. If False, it will be estimated from the starting states. Only set this to true if you’re sure that the observation trajectories are initiated from a global equilibrium distribution.

  • connectivity (str, optional, default = None) –

    Defines if the resulting HMM will be defined on all hidden states or on a connected subset. Connectivity is defined by counting only transitions with at least mincount_connectivity counts. If a subset of states is used, all estimated quantities (transition matrix, stationary distribution, etc) are only defined on this subset and are correspondingly smaller than nstates. Following modes are available: * None or ‘all’ : The active set is the full set of states.

    Estimation is done on all weakly connected subsets separately. The resulting transition matrix may be disconnected.

    • ’largest’ : The active set is the largest reversibly connected set.

    • ’populous’The active set is the reversibly connected set with

      most counts.

  • mincount_connectivity (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. The default evaluates to 1/nstates.

  • separate (None or iterable of int) – Force the given set of observed states to stay in a separate hidden state. The remaining nstates-1 states will be assigned by a metastable decomposition.

  • observe_nonempty (bool) – If True, will restricted the observed states to the states that have at least one observation in the lagged input trajectories.

  • nsamples (int, optional, default=100) – number of transition matrix samples to compute and store

  • stride (str or int, default='effective') –

    stride between two lagged trajectories extracted from the input trajectories. Given trajectory s[t], stride and lag will result in trajectories

    s[0], s[tau], s[2 tau], … s[stride], s[stride + tau], s[stride + 2 tau], …

    Setting stride = 1 will result in using all data (useful for maximum likelihood estimator), while a Bayesian estimator requires a longer stride in order to have statistically uncorrelated trajectories. Setting stride = None ‘effective’ uses the largest neglected timescale as an estimate for the correlation time and sets the stride accordingly.

  • conf (float, optional, default=0.95) – size of confidence intervals

  • dt_traj (str, optional, default='1 step') –

    Description of the physical time corresponding to the trajectory time step. May be used by analysis algorithms such as plotting tools to pretty-print the axes. By default ‘1 step’, i.e. there is no physical time unit. Specify by a number, whitespace and unit. Permitted units are (* is an arbitrary string):

    ’fs’, ‘femtosecond*’
    ’ps’, ‘picosecond*’
    ’ns’, ‘nanosecond*’
    ’us’, ‘microsecond*’
    ’ms’, ‘millisecond*’
    ’s’, ‘second*’

  • store_hidden (bool, optional, default=False) – store hidden trajectories in sampled HMMs

  • show_progress (bool, default=True) – Show progressbars for calculation?

Returns

  • An BayesianHMSM object containing a

  • transition matrix and various other HMM-related quantities and statistical

  • uncertainties.

Example

Note that the following example is only qualitative and not quantitatively reproducible because random numbers are involved

>>> from pyemma import msm
>>> dtrajs = [[0,1,2,2,2,2,1,2,2,2,1,0,0,0,0,0,0,0], [0,0,0,0,1,1,2,2,2,2,2,2,2,1,0,0]]  # two trajectories
>>> mm = msm.bayesian_hidden_markov_model(dtrajs, 2, 2, show_progress=False)

We compute the stationary distribution (here given by the maximum likelihood estimate), and the 1-sigma uncertainty interval. You can see that the uncertainties are quite large (we have seen only very few transitions between the metastable states:

>>> pi = mm.stationary_distribution
>>> piL,piR = mm.sample_conf('stationary_distribution')
>>> for i in range(2): print(pi[i],' -',piL[i],'+',piR[i])  
0.459176653019  - 0.268314552886 + 0.715326151685
0.540823346981  - 0.284761476984 + 0.731730375713

Let’s look at the lifetimes of metastable states. Now we have really huge uncertainties. In states where one state is more probable than the other, the mean first passage time from the more probable to the less probable state is much higher than the reverse:

>>> l = mm.lifetimes
>>> lL, lR = mm.sample_conf('lifetimes')
>>> for i in range(2): print(l[i],' -',lL[i],'+',lR[i])  
7.18543434854  - 6.03617757784 + 80.1298222741
8.65699332061  - 5.35089540896 + 30.1719505772

In contrast the relaxation timescale is less uncertain. This is because for a two-state system the relaxation timescale is dominated by the faster passage, which is less uncertain than the slower passage time:

>>> ts = mm.timescales()
>>> tsL,tsR = mm.sample_conf('timescales')
>>> print(ts[0],' -',tsL[0],'+',tsR[0])  
3.35310468086  - 2.24574587978 + 8.34383177258
class pyemma.msm.estimators.bayesian_hmsm.BayesianHMSM(*args, **kwargs)

Estimator for a Bayesian Hidden Markov state model

Methods

cktest([mlags, conf, err_est, n_jobs, …])

Conducts a Chapman-Kolmogorow test.

committor_backward(A, B)

Backward committor from set A to set B

committor_forward(A, B)

Forward committor (also known as p_fold or splitting probability) from set A to set B

correlation(a[, b, maxtime, k, ncv])

Time-correlation for equilibrium experiment.

eigenvalues([k])

Compute the transition matrix eigenvalues

eigenvectors_left([k])

Compute the left transition matrix eigenvectors

eigenvectors_right([k])

Compute the right transition matrix eigenvectors

estimate(X, **params)

Estimates the model given the data X

expectation(a)

Equilibrium expectation value of a given observable.

fingerprint_correlation(a[, b, k, ncv])

Dynamical fingerprint for equilibrium time-correlation experiment.

fingerprint_relaxation(p0, a[, k, ncv])

Dynamical fingerprint for perturbation/relaxation experiment.

fit(X[, y])

Estimates parameters - for compatibility with sklearn.

get_model_params([deep])

Get parameters for this model.

get_params([deep])

Get parameters for this estimator.

load(file_name[, model_name])

Loads a previously saved PyEMMA object from disk.

mfpt(A, B)

Mean first passage times from set A to set B, in units of the input trajectory time step

pcca(m)

Runs PCCA++ 1 to compute a metastable decomposition of MSM states

propagate(p0, k)

Propagates the initial distribution p0 k times

relaxation(p0, a[, maxtime, k, ncv])

Simulates a perturbation-relaxation experiment.

sample_by_observation_probabilities(nsample)

Generates samples according to the current observation probability distribution

sample_conf(f, *args, **kwargs)

Sample confidence interval of numerical method f over all samples

sample_f(f, *args, **kwargs)

Evaluated method f for all samples

sample_mean(f, *args, **kwargs)

Sample mean of numerical method f over all samples

sample_std(f, *args, **kwargs)

Sample standard deviation of numerical method f over all samples

save(file_name[, model_name, overwrite, …])

saves the current state of this object to given file and name.

set_model_params([samples, conf, P, pobs, …])

param samples

sampled MSMs

set_params(**params)

Set the parameters of this estimator.

simulate(N[, start, stop, dt])

Generates a realization of the Hidden Markov Model

submodel([states, obs, …])

Returns a HMM with restricted state space

submodel_disconnect([mincount_connectivity])

Disconnects sets of hidden states that are barely connected

submodel_largest([strong, mincount_connectivity])

Returns the largest connected sub-HMM (convenience function)

submodel_populous([strong, …])

Returns the most populous connected sub-HMM (convenience function)

timescales([k])

The relaxation timescales corresponding to the eigenvalues

trajectory_weights()

Uses the HMSM to assign a probability weight to each trajectory frame.

transition_matrix_obs([k])

Computes the transition matrix between observed states

update_model_params(**params)

Update given model parameter if they are set to specific values

Attributes

property init_hmsm
submodel(states=None, obs=None, mincount_connectivity='1/n', inplace=False)

Returns a HMM with restricted state space

Parameters
  • states (None, str or int-array) – Hidden states to restrict the model to. In addition to specifying the subset, possible options are: * None : all states - don’t restrict * ‘populous-strong’ : strongly connected subset with maximum counts * ‘populous-weak’ : weakly connected subset with maximum counts * ‘largest-strong’ : strongly connected subset with maximum size * ‘largest-weak’ : weakly connected subset with maximum size

  • obs (None, str or int-array) – Observed states to restrict the model to. In addition to specifying an array with the state labels to be observed, possible options are: * None : all states - don’t restrict * ‘nonempty’ : all states with at least one observation in the estimator

  • mincount_connectivity (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. Default value: 1/nstates.

  • inplace (Bool) – if True, submodel is estimated in-place, overwriting the original estimator and possibly discarding information. Default value: False

Returns

hmm – The restricted HMM.

Return type

HMM

References

1(1,2)

F. Noe, H. Wu, J.-H. Prinz and N. Plattner: Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys. 139, 184114 (2013)

2

J. D. Chodera Et Al: Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty. arXiv:1108.1430 (2011)