pyemma.coordinates.clustering.RegularSpaceClustering¶
-
class
pyemma.coordinates.clustering.
RegularSpaceClustering
(*args, **kwargs)¶ Regular space clustering
-
__init__
(dmin, max_centers=1000, metric='euclidean', stride=1, n_jobs=None, skip=0)¶ Clusters data objects in such a way, that cluster centers are at least in distance of dmin to each other according to the given metric. The assignment of data objects to cluster centers is performed by Voronoi partioning.
Regular space clustering [Prinz_2011] is very similar to Hartigan’s leader algorithm [Hartigan_1975]. It consists of two passes through the data. Initially, the first data point is added to the list of centers. For every subsequent data point, if it has a greater distance than dmin from every center, it also becomes a center. In the second pass, a Voronoi discretization with the computed centers is used to partition the data.
- Parameters
dmin (float) – minimum distance between all clusters.
metric (str) – metric to use during clustering (‘euclidean’, ‘minRMSD’)
max_centers (int) – if this cutoff is hit during finding the centers, the algorithm will abort.
n_jobs (int or None, default None) – Number of threads to use during assignment of the data. If None, all available CPUs will be used.
References
- Prinz_2011
Prinz J-H, Wu H, Sarich M, Keller B, Senne M, Held M, Chodera JD, Schuette Ch and Noe F. 2011. Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105.
- Hartigan_1975
Hartigan J. Clustering algorithms. New York: Wiley; 1975.
Methods
_Loggable__create_logger
()_SerializableMixIn__interpolate
(state, klass)__delattr__
(name, /)Implement delattr(self, name).
__dir__
()Default dir() implementation.
__eq__
(value, /)Return self==value.
__format__
(format_spec, /)Default object formatter.
__ge__
(value, /)Return self>=value.
__getattribute__
(name, /)Return getattr(self, name).
__getstate__
()__gt__
(value, /)Return self>value.
__hash__
()Return hash(self).
__init__
(dmin[, max_centers, metric, …])Clusters data objects in such a way, that cluster centers are at least in distance of dmin to each other according to the given metric.
__init_subclass__
(*args, **kwargs)This method is called when a class is subclassed.
__iter__
()__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__my_getstate__
()__my_setstate__
(state)__ne__
(value, /)Return self!=value.
__new__
(cls, *args, **kwargs)Create and return a new object.
__reduce__
()Helper for pickle.
__reduce_ex__
(protocol, /)Helper for pickle.
__repr__
()Return repr(self).
__setattr__
(name, value, /)Implement setattr(self, name, value).
__setstate__
(state)__sizeof__
()Size of object in memory, in bytes.
__str__
()Return str(self).
__subclasshook__
Abstract classes can override this to customize issubclass().
_check_estimated
()_chunk_finite
(data)_cleanup_logger
(logger_id, logger_name)_clear_in_memory
()_compute_default_cs
(dim, itemsize[, logger])_create_iterator
([skip, chunk, stride, …])Should be implemented by non-abstract subclasses.
_data_flow_chain
()Get a list of all elements in the data flow graph.
_estimate
(iterable, **kwargs)_get_classes_to_inspect
()gets classes self derives from which 1.
_get_interpolation_map
(cls)_get_model_param_names
()Get parameter names for the model
_get_param_names
()Get parameter names for the estimator
_get_private_field
(cls, name[, default])_get_serialize_fields
(cls)_get_state_of_serializeable_fields
(klass, state):return a dictionary {k:v} for k in self.serialize_fields and v=getattr(self, k)
_get_traj_info
(filename)_get_version
(cls[, require])_get_version_for_class_from_state
(state, klass)retrieves the version of the current klass from the state mapping from old locations to new ones.
_logger_is_active
(level)@param level: int log level (debug=10, info=20, warn=30, error=40, critical=50)
_map_to_memory
([stride])Maps results to memory.
_set_random_access_strategies
()_set_state_from_serializeable_fields_and_state
(…)set only fields from state, which are present in klass.__serialize_fields
_source_from_memory
([data_producer])_transform_array
(X)get closest index of point in
clustercenters
to x.assign
([X, stride])Assigns the given trajectory or list of trajectories to cluster centers by using the discretization defined by this clustering method (usually a Voronoi tesselation).
describe
()Get a descriptive string representation of this class.
dimension
()output dimension of clustering algorithm (always 1).
estimate
(X, **kwargs)Estimates the model given the data X
fit
(X[, y])Estimates parameters - for compatibility with sklearn.
fit_predict
(X[, y])Performs clustering on X and returns cluster labels.
fit_transform
(X[, y])Fit to data, then transform it.
get_model_params
([deep])Get parameters for this model.
get_output
([dimensions, stride, skip, chunk])Maps all input data of this transformer and returns it as an array or list of arrays
get_params
([deep])Get parameters for this estimator.
iterator
([stride, lag, chunk, …])creates an iterator to stream over the (transformed) data.
load
(file_name[, model_name])Loads a previously saved PyEMMA object from disk.
n_chunks
(chunksize[, stride, skip])how many chunks an iterator of this sourcde will output, starting (eg.
n_frames_total
([stride, skip])Returns total number of frames.
number_of_trajectories
([stride])Returns the number of trajectories.
output_type
()By default transformers return single precision floats.
sample_indexes_by_cluster
(clusters, nsample)Samples trajectory/time indexes according to the given sequence of states.
save
(file_name[, model_name, overwrite, …])saves the current state of this object to given file and name.
save_dtrajs
([trajfiles, prefix, output_dir, …])saves calculated discrete trajectories.
set_model_params
(clustercenters)set_params
(**params)Set the parameters of this estimator.
trajectory_length
(itraj[, stride, skip])Returns the length of trajectory of the requested index.
trajectory_lengths
([stride, skip])Returns the length of each trajectory.
transform
(X)Maps the input data through the transformer to correspondingly shaped output data array/list.
update_model_params
(**params)Update given model parameter if they are set to specific values
write_to_csv
([filename, extension, …])write all data to csv with numpy.savetxt
write_to_hdf5
(filename[, group, …])writes all data of this Iterable to a given HDF5 file.
Attributes
_AbstractClustering__serialize_fields
_AbstractClustering__serialize_version
_DataSource__serialize_fields
_Estimator__serialize_fields
_FALLBACK_CHUNKSIZE
_InMemoryMixin__serialize_fields
_InMemoryMixin__serialize_version
_Loggable__ids
_Loggable__refs
_RegularSpaceClustering__serialize_version
_SerializableMixIn__serialize_fields
_SerializableMixIn__serialize_modifications_map
_SerializableMixIn__serialize_version
__abstractmethods__
__dict__
__doc__
__module__
__weakref__
list of weak references to the object (if defined)
_abc_impl
_estimated
_estimator_type
_loglevel_CRITICAL
_loglevel_DEBUG
_loglevel_ERROR
_loglevel_INFO
_loglevel_WARN
_save_data_producer
_serialize_version
chunksize
chunksize defines how much data is being processed at once.
cluster_centers_
Array containing the coordinates of the calculated cluster centers.
clustercenters
Array containing the coordinates of the calculated cluster centers.
data_producer
The data producer for this data source object (can be another data source object).
default_chunksize
How much data will be processed at once, in case no chunksize has been provided.
dmin
Minimum distance between cluster centers.
dtrajs
Discrete trajectories (assigned data to cluster centers).
filenames
list of file names the data is originally being read from.
in_memory
are results stored in memory?
index_clusters
Returns trajectory/time indexes for all the clusters
is_random_accessible
Check if self._is_random_accessible is set to true and if all the random access strategies are implemented.
is_reader
Property telling if this data source is a reader or not.
logger
The logger for this class instance
max_centers
Cutoff during clustering.
model
The model estimated by this Estimator
n_clusters
n_jobs
Returns number of jobs/threads to use during assignment of data.
name
The name of this instance
ndim
ntraj
overwrite_dtrajs
Should existing dtraj files be overwritten.
ra_itraj_cuboid
Implementation of random access with slicing that can be up to 3-dimensional, where the first dimension corresponds to the trajectory index, the second dimension corresponds to the frames and the third dimension corresponds to the dimensions of the frames.
ra_itraj_jagged
Behaves like ra_itraj_cuboid just that the trajectories are not truncated and returned as a list.
ra_itraj_linear
Implementation of random access that takes arguments as the default random access (i.e., up to three dimensions with trajs, frames and dims, respectively), but which considers the frame indexing to be contiguous.
ra_linear
Implementation of random access that takes a (maximal) two-dimensional slice where the first component corresponds to the frames and the second component corresponds to the dimensions.
-