pyemma.coordinates.assign_to_centers

pyemma.coordinates.assign_to_centers(data=None, centers=None, stride=1, return_dtrajs=True, metric='euclidean')

Assigns data to the nearest cluster centers

Creates a Voronoi partition with the given cluster centers. If given trajectories as data, this function will by default discretize the trajectories and return discrete trajectories of corresponding lengths. Otherwise, an assignment object will be returned that can be used to assign data later or can serve as a pipeline stage.

Parameters:
  • data (ndarray or list of arrays or reader created by source function) – data to be assigned
  • centers (path to file or ndarray or a reader created by source function) – cluster centers to use in assignment of data
  • stride (int, optional, default = 1) – If set to 1, all input data will be used for estimation. Note that this could cause this calculation to be very slow for large data sets. Since molecular dynamics data is usually correlated at short timescales, it is often sufficient to estimate transformations at a longer stride. Note that the stride option in the get_output() function of the returned object is independent, so you can parametrize at a long stride, and still map all frames through the transformer.
  • return_dtrajs (bool, optional, default = True) – If True, it will return the discretized trajectories obtained from assigning the coordinates in the data input. This will only have effect if data is given. When data is not given or return_dtrajs is False, the :class:’AssignCenters <_AssignCenters>’ object will be returned.
  • metric (str) – metric to use during clustering (‘euclidean’, ‘minRMSD’)
Returns:

assignment – assigned data

Return type:

list of integer arrays or an AssignCenters object

Examples

Load data to assign to clusters from ‘my_data.csv’ by using the cluster centers from file ‘my_centers.csv’

>>> import numpy as np
>>> data = np.loadtxt('my_data.csv')
>>> cluster_centers = np.loadtxt('my_centers.csv')
>>> dtrajs = assign_to_centers(data, cluster_centers)
>>> print dtrajs
[array([0, 0, 1, ... ])]