pyemma.coordinates.clustering.AssignCenters¶

class pyemma.coordinates.clustering.AssignCenters(*args, **kwargs)¶

Assigns given (pre-calculated) cluster centers. If you already have cluster centers from somewhere, you use this class to assign your data to it.

Parameters

clustercenters (path to file (csv) or npyfile or ndarray) – cluster centers to use in assignment of data
metric (str) – metric to use during clustering (‘euclidean’, ‘minRMSD’)
stride (int) – stride
n_jobs (int or None, default None) – Number of threads to use during assignment of the data. If None, all available CPUs will be used.
skip (int, default=0) – skip the first initial n frames per trajectory.

Examples

Assuming you have stored your centers in a CSV file:

>>> from pyemma.coordinates.clustering import AssignCenters
>>> from pyemma.coordinates import pipeline
>>> reader = ... 
>>> assign = AssignCenters('my_centers.dat') 
>>> disc = pipeline(reader, cluster=assign) 
>>> disc.parametrize() 

__init__(clustercenters, metric='euclidean', stride=1, n_jobs=None, skip=0)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`_Loggable__create_logger`()
`_SerializableMixIn__interpolate`(state, klass)
`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__getstate__`()
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`(clustercenters[, metric, stride, …])	Initialize self.
`__init_subclass__`(args, *kwargs)	This method is called when a class is subclassed.
`__iter__`()
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__my_getstate__`()
`__my_setstate__`(state)
`__ne__`(value, /)	Return self!=value.
`__new__`(cls, args, *kwargs)	Create and return a new object.
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__setstate__`(state)
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`_check_estimated`()
`_chunk_finite`(data)
`_cleanup_logger`(logger_id, logger_name)
`_clear_in_memory`()
`_compute_default_cs`(dim, itemsize[, logger])
`_create_iterator`([skip, chunk, stride, …])	Should be implemented by non-abstract subclasses.
`_data_flow_chain`()	Get a list of all elements in the data flow graph.
`_estimate`(iterable, **kw)
`_get_classes_to_inspect`()	gets classes self derives from which 1.
`_get_interpolation_map`(cls)
`_get_model_param_names`()	Get parameter names for the model
`_get_param_names`()	Get parameter names for the estimator
`_get_private_field`(cls, name[, default])
`_get_serialize_fields`(cls)
`_get_state_of_serializeable_fields`(klass, state)	:return a dictionary {k:v} for k in self.serialize_fields and v=getattr(self, k)
`_get_traj_info`(filename)
`_get_version`(cls[, require])
`_get_version_for_class_from_state`(state, klass)	retrieves the version of the current klass from the state mapping from old locations to new ones.
`_logger_is_active`(level)	@param level: int log level (debug=10, info=20, warn=30, error=40, critical=50)
`_map_to_memory`([stride])	Maps results to memory.
`_set_random_access_strategies`()
`_set_state_from_serializeable_fields_and_state`(…)	set only fields from state, which are present in klass.__serialize_fields
`_source_from_memory`([data_producer])
`_transform_array`(X)	get closest index of point in `clustercenters` to x.
`assign`([X, stride])	Assigns the given trajectory or list of trajectories to cluster centers by using the discretization defined by this clustering method (usually a Voronoi tesselation).
`describe`()	Get a descriptive string representation of this class.
`dimension`()	output dimension of clustering algorithm (always 1).
`estimate`(X, **kwargs)	Estimates the model given the data X
`fit`(X[, y])	Estimates parameters - for compatibility with sklearn.
`fit_predict`(X[, y])	Performs clustering on X and returns cluster labels.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_model_params`([deep])	Get parameters for this model.
`get_output`([dimensions, stride, skip, chunk])	Maps all input data of this transformer and returns it as an array or list of arrays
`get_params`([deep])	Get parameters for this estimator.
`iterator`([stride, lag, chunk, …])	creates an iterator to stream over the (transformed) data.
`load`(file_name[, model_name])	Loads a previously saved PyEMMA object from disk.
`n_chunks`(chunksize[, stride, skip])	how many chunks an iterator of this sourcde will output, starting (eg.
`n_frames_total`([stride, skip])	Returns total number of frames.
`number_of_trajectories`([stride])	Returns the number of trajectories.
`output_type`()	By default transformers return single precision floats.
`sample_indexes_by_cluster`(clusters, nsample)	Samples trajectory/time indexes according to the given sequence of states.
`save`(file_name[, model_name, overwrite, …])	saves the current state of this object to given file and name.
`save_dtrajs`([trajfiles, prefix, output_dir, …])	saves calculated discrete trajectories.
`set_model_params`(clustercenters)
`set_params`(**params)	Set the parameters of this estimator.
`trajectory_length`(itraj[, stride, skip])	Returns the length of trajectory of the requested index.
`trajectory_lengths`([stride, skip])	Returns the length of each trajectory.
`transform`(X)	Maps the input data through the transformer to correspondingly shaped output data array/list.
`update_model_params`(**params)	Update given model parameter if they are set to specific values
`write_to_csv`([filename, extension, …])	write all data to csv with numpy.savetxt
`write_to_hdf5`(filename[, group, …])	writes all data of this Iterable to a given HDF5 file.

Attributes

`_AbstractClustering__serialize_fields`
`_AbstractClustering__serialize_version`
`_AssignCenters__serialize_version`
`_DataSource__serialize_fields`
`_Estimator__serialize_fields`
`_FALLBACK_CHUNKSIZE`
`_InMemoryMixin__serialize_fields`
`_InMemoryMixin__serialize_version`
`_Loggable__ids`
`_Loggable__refs`
`_SerializableMixIn__serialize_fields`
`_SerializableMixIn__serialize_modifications_map`
`_SerializableMixIn__serialize_version`
`__abstractmethods__`
`__dict__`
`__doc__`
`__module__`
`__weakref__`	list of weak references to the object (if defined)
`_abc_impl`
`_estimated`
`_estimator_type`
`_loglevel_CRITICAL`
`_loglevel_DEBUG`
`_loglevel_ERROR`
`_loglevel_INFO`
`_loglevel_WARN`
`_save_data_producer`
`_serialize_version`
`chunksize`	chunksize defines how much data is being processed at once.
`cluster_centers_`	Array containing the coordinates of the calculated cluster centers.
`clustercenters`	Array containing the coordinates of the calculated cluster centers.
`data_producer`	The data producer for this data source object (can be another data source object).
`default_chunksize`	How much data will be processed at once, in case no chunksize has been provided.
`dtrajs`	Discrete trajectories (assigned data to cluster centers).
`filenames`	list of file names the data is originally being read from.
`in_memory`	are results stored in memory?
`index_clusters`	Returns trajectory/time indexes for all the clusters
`is_random_accessible`	Check if self._is_random_accessible is set to true and if all the random access strategies are implemented.
`is_reader`	Property telling if this data source is a reader or not.
`logger`	The logger for this class instance
`model`	The model estimated by this Estimator
`n_jobs`	Returns number of jobs/threads to use during assignment of data.
`name`	The name of this instance
`ndim`
`ntraj`
`overwrite_dtrajs`	Should existing dtraj files be overwritten.
`ra_itraj_cuboid`	Implementation of random access with slicing that can be up to 3-dimensional, where the first dimension corresponds to the trajectory index, the second dimension corresponds to the frames and the third dimension corresponds to the dimensions of the frames.
`ra_itraj_jagged`	Behaves like ra_itraj_cuboid just that the trajectories are not truncated and returned as a list.
`ra_itraj_linear`	Implementation of random access that takes arguments as the default random access (i.e., up to three dimensions with trajs, frames and dims, respectively), but which considers the frame indexing to be contiguous.
`ra_linear`	Implementation of random access that takes a (maximal) two-dimensional slice where the first component corresponds to the frames and the second component corresponds to the dimensions.