pyemma.thermo.dtram

pyemma.thermo.dtram(ttrajs, dtrajs, bias, lag, unbiased_state=None, count_mode='sliding', connectivity='largest', maxiter=10000, maxerr=1e-15, save_convergence_info=0, dt_traj='1 step', init=None, init_maxiter=10000, init_maxerr=1e-08)

Discrete transition-based reweighting analysis method

Parameters:
  • ttrajs (numpy.ndarray(T) of int, or list of numpy.ndarray(T_i) of int) – A single discrete trajectory or a list of discrete trajectories. The integers are indexes in 0,...,num_therm_states-1 enumerating the thermodynamic states the trajectory is in at any time.
  • dtrajs (numpy.ndarray(T) of int, or list of numpy.ndarray(T_i) of int) – A single discrete trajectory or a list of discrete trajectories. The integers are indexes in 0,...,num_conf_states-1 enumerating the num_conf_states Markov states or the bins the trajectory is in at any time.
  • bias (numpy.ndarray(shape=(num_therm_states, num_conf_states)) object) – bias_energies_full[j, i] is the bias energy in units of kT for each discrete state i at thermodynamic state j.
  • lag (int or list of int, optional, default=1) – Integer lag time at which transitions are counted. Providing a list of lag times will trigger one estimation per lag time.
  • count_mode (str, optional, default='sliding') –

    Mode to obtain count matrices from discrete trajectories. Should be one of:

    • ‘sliding’
      : a trajectory of length T will have \(T-\tau\) counts at time indexes
      \[(0 \rightarrow \tau), (1 \rightarrow \tau+1), ..., (T-\tau-1 \rightarrow T-1)\]
    • ‘sample’
      : a trajectory of length T will have \(T/\tau\) counts at time indexes
      \[(0 \rightarrow \tau), (\tau \rightarrow 2 \tau), ..., ((T/\tau-1) \tau \rightarrow T)\]

    Currently only ‘sliding’ is supported.

  • connectivity (str, optional, default='largest') – Defines what should be considered a connected set in the joint space of conformations and thermodynamic ensembles. Currently only ‘largest’ is supported.
  • maxiter (int, optional, default=10000) – The maximum number of dTRAM iterations before the estimator exits unsuccessfully.
  • maxerr (float, optional, default=1e-15) – Convergence criterion based on the maximal free energy change in a self-consistent iteration step.
  • save_convergence_info (int, optional, default=0) – Every save_convergence_info iteration steps, store the actual increment and the actual loglikelihood; 0 means no storage.
  • dt_traj (str, optional, default='1 step') –

    Description of the physical time corresponding to the lag. May be used by analysis algorithms such as plotting tools to pretty-print the axes. By default ‘1 step’, i.e. there is no physical time unit. Specify by a number, whitespace and unit. Permitted units are (* is an arbitrary string):

    ‘fs’, ‘femtosecond*’
    ‘ps’, ‘picosecond*’
    ‘ns’, ‘nanosecond*’
    ‘us’, ‘microsecond*’
    ‘ms’, ‘millisecond*’
    ‘s’, ‘second*’
  • init (str, optional, default=None) –

    Use a specific initialization for self-consistent iteration:

    None: use a hard-coded guess for free energies and Lagrangian multipliers
    ‘wham’: perform a short WHAM estimate to initialize the free energies
  • init_maxiter (int, optional, default=10000) – The maximum number of self-consistent iterations during the initialization.
  • init_maxerr (float, optional, default=1.0E-8) – Convergence criterion for the initialization.
Returns:

dtram_estimators – A multi-ensemble Markov state model (for each given lag time) which consists of stationary and kinetic quantities at all temperatures/thermodynamic states.

Return type:

MEMM or list of MEMMs

Example

Umbrella sampling: Suppose we simulate in K umbrellas, centered at positions \(y_0,...,y_{K-1}\) with bias energies

\[b_k(x) = \frac{c_k}{2 \textrm{kT}} \cdot (x - y_k)^2\]

Suppose we have one simulation of length T in each umbrella, and they are ordered from 0 to K-1. We have discretized the x-coordinate into 100 bins. Then dtrajs and ttrajs should each be a list of \(K\) arrays. dtrajs would look for example like this:

[ (0, 0, 0, 0, 1, 1, 1, 0, 0, 0, ...),  (0, 1, 0, 1, 0, 1, 1, 0, 0, 1, ...), ... ]

where each array has length T, and is the sequence of bins (in the range 0 to 99) visited along the trajectory. ttrajs would look like this:

[ (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...),  (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...), ... ]

Because trajectory 1 stays in umbrella 1 (index 0), trajectory 2 stays in umbrella 2 (index 1), and so forth. bias is a \(K \times n\) matrix with all reduced bias energies evaluated at all centers:

\[\begin{split}\left(\begin{array}{cccc} b_0(y_0) & b_0(y_1) & ... & b_0(y_{n-1}) \\ b_1(y_0) & b_1(y_1) & ... & b_1(y_{n-1}) \\ ... \\ b_{K-1}(y_0) & b_{K-1}(y_1) & ... & b_{K-1}(y_{n-1}) \end{array}\right)\end{split}\]

Let us try the above example:

>>> from pyemma.thermo import dtram
>>> import numpy as np
>>> ttrajs = [np.array([0,0,0,0,0,0,0,0,0,0]), np.array([1,1,1,1,1,1,1,1,1,1])]
>>> dtrajs = [np.array([0,0,0,0,1,1,1,0,0,0]), np.array([0,1,0,1,0,1,1,0,0,1])]
>>> bias = np.array([[0.0, 0.0], [0.5, 1.0]])
>>> dtram_obj = dtram(ttrajs, dtrajs, bias, 1)
>>> dtram_obj.log_likelihood() 
-9.805...
>>> dtram_obj.count_matrices 
array([[[5, 1],
        [1, 2]],
       [[1, 4],
        [3, 1]]], dtype=int32)
>>> dtram_obj.stationary_distribution 
array([ 0.38...,  0.61...])

References

[1]Wu, H. et al 2014 Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states J. Chem. Phys. 141, 214106