pyemma.msm.bayesian_hidden_markov_model¶

pyemma.msm.bayesian_hidden_markov_model(dtrajs, nstates, lag, nsamples=100, reversible=True, stationary=False, connectivity=None, mincount_connectivity='1/n', separate=None, observe_nonempty=True, stride='effective', conf=0.95, dt_traj='1 step', store_hidden=False, show_progress=True)¶

Bayesian Hidden Markov model estimate using Gibbs sampling of the posterior

Returns a BayesianHMSM that contains the estimated hidden Markov model 1 and a Bayesian estimate 2 that contains samples around this estimate to estimate uncertainties.

Parameters

dtrajs (list containing ndarrays(dtype=int) or ndarray(n, dtype=int)) – discrete trajectories, stored as integer ndarrays (arbitrary size) or a single ndarray for only one trajectory.
lag (int) – lagtime for the MSM estimation in multiples of trajectory steps
nstates (int) – the number of metastable states in the resulting HMM
reversible (bool, optional, default = True) – If true compute reversible MSM, else non-reversible MSM
stationary (bool, optional, default=False) – If True, the initial distribution of hidden states is self-consistently computed as the stationary distribution of the transition matrix. If False, it will be estimated from the starting states. Only set this to true if you’re sure that the observation trajectories are initiated from a global equilibrium distribution.
connectivity (str, optional, default = None) –
Defines if the resulting HMM will be defined on all hidden states or on a connected subset. Connectivity is defined by counting only transitions with at least mincount_connectivity counts. If a subset of states is used, all estimated quantities (transition matrix, stationary distribution, etc) are only defined on this subset and are correspondingly smaller than nstates. Following modes are available: * None or ‘all’ : The active set is the full set of states.

Estimation is done on all weakly connected subsets separately. The resulting transition matrix may be disconnected.
- ’largest’ : The active set is the largest reversibly connected set.
- ’populous’The active set is the reversibly connected set with
  most counts.
mincount_connectivity (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. The default evaluates to 1/nstates.
separate (None or iterable of int) – Force the given set of observed states to stay in a separate hidden state. The remaining nstates-1 states will be assigned by a metastable decomposition.
observe_nonempty (bool) – If True, will restricted the observed states to the states that have at least one observation in the lagged input trajectories.
nsamples (int, optional, default=100) – number of transition matrix samples to compute and store
stride (str or int, default='effective') –
stride between two lagged trajectories extracted from the input trajectories. Given trajectory s[t], stride and lag will result in trajectories

s[0], s[tau], s[2 tau], … s[stride], s[stride + tau], s[stride + 2 tau], …

Setting stride = 1 will result in using all data (useful for maximum likelihood estimator), while a Bayesian estimator requires a longer stride in order to have statistically uncorrelated trajectories. Setting stride = None ‘effective’ uses the largest neglected timescale as an estimate for the correlation time and sets the stride accordingly.
conf (float, optional, default=0.95) – size of confidence intervals
dt_traj (str, optional, default='1 step') –
Description of the physical time corresponding to the trajectory time step. May be used by analysis algorithms such as plotting tools to pretty-print the axes. By default ‘1 step’, i.e. there is no physical time unit. Specify by a number, whitespace and unit. Permitted units are (* is an arbitrary string):

’fs’, ‘femtosecond*’

’ps’, ‘picosecond*’

’ns’, ‘nanosecond*’

’us’, ‘microsecond*’

’ms’, ‘millisecond*’

’s’, ‘second*’
store_hidden (bool, optional, default=False) – store hidden trajectories in sampled HMMs
show_progress (bool, default=True) – Show progressbars for calculation?

Returns

An BayesianHMSM object containing a
transition matrix and various other HMM-related quantities and statistical
uncertainties.

Example

Note that the following example is only qualitative and not quantitatively reproducible because random numbers are involved

>>> from pyemma import msm
>>> dtrajs = [[0,1,2,2,2,2,1,2,2,2,1,0,0,0,0,0,0,0], [0,0,0,0,1,1,2,2,2,2,2,2,2,1,0,0]]  # two trajectories
>>> mm = msm.bayesian_hidden_markov_model(dtrajs, 2, 2, show_progress=False)

We compute the stationary distribution (here given by the maximum likelihood estimate), and the 1-sigma uncertainty interval. You can see that the uncertainties are quite large (we have seen only very few transitions between the metastable states:

>>> pi = mm.stationary_distribution
>>> piL,piR = mm.sample_conf('stationary_distribution')
>>> for i in range(2): print(pi[i],' -',piL[i],'+',piR[i])  
0.459176653019  - 0.268314552886 + 0.715326151685
0.540823346981  - 0.284761476984 + 0.731730375713

Let’s look at the lifetimes of metastable states. Now we have really huge uncertainties. In states where one state is more probable than the other, the mean first passage time from the more probable to the less probable state is much higher than the reverse:

>>> l = mm.lifetimes
>>> lL, lR = mm.sample_conf('lifetimes')
>>> for i in range(2): print(l[i],' -',lL[i],'+',lR[i])  
7.18543434854  - 6.03617757784 + 80.1298222741
8.65699332061  - 5.35089540896 + 30.1719505772

In contrast the relaxation timescale is less uncertain. This is because for a two-state system the relaxation timescale is dominated by the faster passage, which is less uncertain than the slower passage time:

>>> ts = mm.timescales()
>>> tsL,tsR = mm.sample_conf('timescales')
>>> print(ts[0],' -',tsL[0],'+',tsR[0])  
3.35310468086  - 2.24574587978 + 8.34383177258

class pyemma.msm.estimators.bayesian_hmsm.BayesianHMSM(*args, **kwargs)¶

Estimator for a Bayesian Hidden Markov state model

Methods

`cktest`([mlags, conf, err_est, n_jobs, …])	Conducts a Chapman-Kolmogorow test.
`committor_backward`(A, B)	Backward committor from set A to set B
`committor_forward`(A, B)	Forward committor (also known as p_fold or splitting probability) from set A to set B
`correlation`(a[, b, maxtime, k, ncv])	Time-correlation for equilibrium experiment.
`eigenvalues`([k])	Compute the transition matrix eigenvalues
`eigenvectors_left`([k])	Compute the left transition matrix eigenvectors
`eigenvectors_right`([k])	Compute the right transition matrix eigenvectors
`estimate`(X, **params)	Estimates the model given the data X
`expectation`(a)	Equilibrium expectation value of a given observable.
`fingerprint_correlation`(a[, b, k, ncv])	Dynamical fingerprint for equilibrium time-correlation experiment.
`fingerprint_relaxation`(p0, a[, k, ncv])	Dynamical fingerprint for perturbation/relaxation experiment.
`fit`(X[, y])	Estimates parameters - for compatibility with sklearn.
`get_model_params`([deep])	Get parameters for this model.
`get_params`([deep])	Get parameters for this estimator.
`load`(file_name[, model_name])	Loads a previously saved PyEMMA object from disk.
`mfpt`(A, B)	Mean first passage times from set A to set B, in units of the input trajectory time step
`pcca`(m)	Runs PCCA++ 1 to compute a metastable decomposition of MSM states
`propagate`(p0, k)	Propagates the initial distribution p0 k times
`relaxation`(p0, a[, maxtime, k, ncv])	Simulates a perturbation-relaxation experiment.
`sample_by_observation_probabilities`(nsample)	Generates samples according to the current observation probability distribution
`sample_conf`(f, args, *kwargs)	Sample confidence interval of numerical method f over all samples
`sample_f`(f, args, *kwargs)	Evaluated method f for all samples
`sample_mean`(f, args, *kwargs)	Sample mean of numerical method f over all samples
`sample_std`(f, args, *kwargs)	Sample standard deviation of numerical method f over all samples
`save`(file_name[, model_name, overwrite, …])	saves the current state of this object to given file and name.
`set_model_params`([samples, conf, P, pobs, …])	param samples sampled MSMs
`set_params`(**params)	Set the parameters of this estimator.
`simulate`(N[, start, stop, dt])	Generates a realization of the Hidden Markov Model
`submodel`([states, obs, …])	Returns a HMM with restricted state space
`submodel_disconnect`([mincount_connectivity])	Disconnects sets of hidden states that are barely connected
`submodel_largest`([strong, mincount_connectivity])	Returns the largest connected sub-HMM (convenience function)
`submodel_populous`([strong, …])	Returns the most populous connected sub-HMM (convenience function)
`timescales`([k])	The relaxation timescales corresponding to the eigenvalues
`trajectory_weights`()	Uses the HMSM to assign a probability weight to each trajectory frame.
`transition_matrix_obs`([k])	Computes the transition matrix between observed states
`update_model_params`(**params)	Update given model parameter if they are set to specific values

Attributes

property init_hmsm¶

submodel(states=None, obs=None, mincount_connectivity='1/n', inplace=False)¶

Returns a HMM with restricted state space

Parameters

states (None, str or int-array) – Hidden states to restrict the model to. In addition to specifying the subset, possible options are: * None : all states - don’t restrict * ‘populous-strong’ : strongly connected subset with maximum counts * ‘populous-weak’ : weakly connected subset with maximum counts * ‘largest-strong’ : strongly connected subset with maximum size * ‘largest-weak’ : weakly connected subset with maximum size
obs (None, str or int-array) – Observed states to restrict the model to. In addition to specifying an array with the state labels to be observed, possible options are: * None : all states - don’t restrict * ‘nonempty’ : all states with at least one observation in the estimator
mincount_connectivity (float or '1/n') – minimum number of counts to consider a connection between two states. Counts lower than that will count zero in the connectivity check and may thus separate the resulting transition matrix. Default value: 1/nstates.
inplace (Bool) – if True, submodel is estimated in-place, overwriting the original estimator and possibly discarding information. Default value: False

Returns

hmm – The restricted HMM.

Return type

HMM

References

1(1,2): F. Noe, H. Wu, J.-H. Prinz and N. Plattner: Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules. J. Chem. Phys. 139, 184114 (2013)
2: J. D. Chodera Et Al: Bayesian hidden Markov model analysis of single-molecule force spectroscopy: Characterizing kinetics under measurement uncertainty. arXiv:1108.1430 (2011)