pyemma.coordinates.assign_to_centers

pyemma.coordinates.assign_to_centers(data=None, centers=None, stride=1, return_dtrajs=True, metric='euclidean', n_jobs=None, chunksize=None, skip=0, **kwargs)

Assigns data to the nearest cluster centers

Creates a Voronoi partition with the given cluster centers. If given trajectories as data, this function will by default discretize the trajectories and return discrete trajectories of corresponding lengths. Otherwise, an assignment object will be returned that can be used to assign data later or can serve as a pipeline stage.

Parameters
  • data (ndarray or list of arrays or reader created by source function) – data to be assigned

  • centers (path to file or ndarray or a reader created by source function) – cluster centers to use in assignment of data

  • stride (int, optional, default = 1) – assign only every n’th frame to the centers. Usually you want to assign all the data and only use a stride during calculation the centers.

  • return_dtrajs (bool, optional, default = True) – If True, it will return the discretized trajectories obtained from assigning the coordinates in the data input. This will only have effect if data is given. When data is not given or return_dtrajs is False, the :class:’AssignCenters <_AssignCenters>’ object will be returned.

  • metric (str) – metric to use during clustering (‘euclidean’, ‘minRMSD’)

  • n_jobs (int or None, default None) – Number of threads to use during assignment of the data. If None, all available CPUs will be used.

  • chunksize (int, default=None) – Number of data frames to process at once. Choose a higher value here, to optimize thread usage and gain processing speed. If None is passed, use the default value of the underlying reader/data source. Choose zero to disable chunking at all.

Returns

assignment – assigned data

Return type

list of integer arrays or an AssignCenters object

Examples

Load data to assign to clusters from ‘my_data.csv’ by using the cluster centers from file ‘my_centers.csv’

>>> import numpy as np

Generate some random data and choose 10 random centers:

>>> data = np.random.random((100, 3))
>>> cluster_centers = data[np.random.randint(0, 99, size=10)]
>>> dtrajs = assign_to_centers(data, cluster_centers)
>>> print(dtrajs) 
[array([...
class pyemma.coordinates.clustering.assign.AssignCenters(*args, **kwargs)

Assigns given (pre-calculated) cluster centers. If you already have cluster centers from somewhere, you use this class to assign your data to it.

Parameters
  • clustercenters (path to file (csv) or npyfile or ndarray) – cluster centers to use in assignment of data

  • metric (str) – metric to use during clustering (‘euclidean’, ‘minRMSD’)

  • stride (int) – stride

  • n_jobs (int or None, default None) – Number of threads to use during assignment of the data. If None, all available CPUs will be used.

  • skip (int, default=0) – skip the first initial n frames per trajectory.

Examples

Assuming you have stored your centers in a CSV file:

>>> from pyemma.coordinates.clustering import AssignCenters
>>> from pyemma.coordinates import pipeline
>>> reader = ... 
>>> assign = AssignCenters('my_centers.dat') 
>>> disc = pipeline(reader, cluster=assign) 
>>> disc.parametrize() 

Methods

assign([X, stride])

Assigns the given trajectory or list of trajectories to cluster centers by using the discretization defined by this clustering method (usually a Voronoi tesselation).

describe()

Get a descriptive string representation of this class.

dimension()

output dimension of clustering algorithm (always 1).

estimate(X, **kwargs)

Estimates the model given the data X

fit(X[, y])

Estimates parameters - for compatibility with sklearn.

fit_predict(X[, y])

Performs clustering on X and returns cluster labels.

fit_transform(X[, y])

Fit to data, then transform it.

get_model_params([deep])

Get parameters for this model.

get_output([dimensions, stride, skip, chunk])

Maps all input data of this transformer and returns it as an array or list of arrays

get_params([deep])

Get parameters for this estimator.

iterator([stride, lag, chunk, …])

creates an iterator to stream over the (transformed) data.

load(file_name[, model_name])

Loads a previously saved PyEMMA object from disk.

n_chunks(chunksize[, stride, skip])

how many chunks an iterator of this sourcde will output, starting (eg.

n_frames_total([stride, skip])

Returns total number of frames.

number_of_trajectories([stride])

Returns the number of trajectories.

output_type()

By default transformers return single precision floats.

sample_indexes_by_cluster(clusters, nsample)

Samples trajectory/time indexes according to the given sequence of states.

save(file_name[, model_name, overwrite, …])

saves the current state of this object to given file and name.

save_dtrajs([trajfiles, prefix, output_dir, …])

saves calculated discrete trajectories.

set_model_params(clustercenters)

set_params(**params)

Set the parameters of this estimator.

trajectory_length(itraj[, stride, skip])

Returns the length of trajectory of the requested index.

trajectory_lengths([stride, skip])

Returns the length of each trajectory.

transform(X)

Maps the input data through the transformer to correspondingly shaped output data array/list.

update_model_params(**params)

Update given model parameter if they are set to specific values

write_to_csv([filename, extension, …])

write all data to csv with numpy.savetxt

write_to_hdf5(filename[, group, …])

writes all data of this Iterable to a given HDF5 file.

Attributes

property data_producer

The data producer for this data source object (can be another data source object). :returns: :rtype: This data source’s data producer.

describe()

Get a descriptive string representation of this class.