pyemma.coordinates.discretizer¶

pyemma.coordinates.discretizer(reader, transform=None, cluster=None, run=True, stride=1, chunksize=1000)¶

Specialized pipeline: From trajectories to clustering.

Constructs a pipeline that consists of three stages:

an input stage (mandatory)

a transformer stage (optional)

a clustering stage (mandatory)

This function is identical to calling pipeline() with the three stages, it is only meant as a guidance for the (probably) most common usage cases of a pipeline.

Parameters:	reader (instance of `pyemma.coordinates.data.reader.ChunkedReader`) – The reader instance provides access to the data. If you are working with MD data, you most likely want to use a FeatureReader. transform (instance of :class: pyemma.coordinates.Transformer) – an optional transform like PCA/TICA etc. cluster (instance of :class: pyemma.coordinates.AbstractClustering) – clustering Transformer (optional) a cluster algorithm to assign transformed data to discrete states. stride (int, optional, default = 1) – If set to 1, all input data will be used throughout the pipeline to parametrize its stages. Note that this could cause the parametrization step to be very slow for large data sets. Since molecular dynamics data is usually correlated at short timescales, it is often sufficient to parametrize the pipeline at a longer stride. See also stride option in the output functions of the pipeline. chunksize (int, optiona, default = 100) – how many datapoints to process as a batch at one step
Returns:	pipe – A pipeline object that is able to streamline data analysis of large amounts of input data with limited memory in streaming mode.
Return type:	a `Pipeline` object

Examples

Construct a discretizer pipeline processing all data with a PCA transformation and cluster the principal components with uniform time clustering:

>>> from pyemma.coordinates import source, pca, cluster_regspace, discretizer
>>> from pyemma.datasets import get_bpti_test_data
>>> from pyemma.util.contexts import settings
>>> reader = source(get_bpti_test_data()['trajs'], top=get_bpti_test_data()['top'])
>>> transform = pca(dim=2)
>>> cluster = cluster_regspace(dmin=0.1)

Create the discretizer, access the the discrete trajectories and save them to files:

>>> with settings(show_progress_bars=False):
...     disc = discretizer(reader, transform, cluster)
...     disc.dtrajs 
[array([...

This will store the discrete trajectory to “traj01.dtraj”:

>>> from pyemma.util.files import TemporaryDirectory
>>> import os
>>> with TemporaryDirectory('dtrajs') as tmpdir:
...     disc.save_dtrajs(output_dir=tmpdir)
...     sorted(os.listdir(tmpdir))
['bpti_001-033.dtraj', 'bpti_034-066.dtraj', 'bpti_067-100.dtraj']

class pyemma.coordinates.pipelines.Pipeline(chain, chunksize=100, param_stride=1)¶

Data processing pipeline.

Methods

`add_element`(e)	Appends a pipeline stage.
`parametrize`()	Reads all data and discretizes it into discrete trajectories.
`set_element`(index, e)	Replaces a pipeline stage.

Attributes

chunksize

add_element(e)¶

Appends a pipeline stage.

Appends the given element to the end of the current chain.

chunksize¶

parametrize()¶: Reads all data and discretizes it into discrete trajectories.

set_element(index, e)¶

Replaces a pipeline stage.

Replace an element in chain and return replaced element.