pyemma.coordinates.cluster_mini_batch_kmeans¶

pyemma.coordinates.cluster_mini_batch_kmeans(data=None, k=100, max_iter=10, batch_size=0.2, metric='euclidean', init_strategy='kmeans++', n_jobs=None, chunksize=None, skip=0, clustercenters=None, **kwargs)¶

k-means clustering with mini-batch strategy

Mini-batch k-means is an approximation to k-means which picks a randomly selected subset of data points to be updated in each iteration. Usually much faster than k-means but will likely deliver a less optimal result.

Returns: kmeans_mini – Object for mini-batch kmeans clustering. It holds discrete trajectories and cluster center information.
Return type: a MiniBatchKmeansClustering clustering object

See also

kmeans : for full k-means clustering

class pyemma.coordinates.clustering.kmeans.MiniBatchKmeansClustering(*args, **kwargs)¶

Mini-batch k-means clustering

Methods

`assign`([X, stride])	Assigns the given trajectory or list of trajectories to cluster centers by using the discretization defined by this clustering method (usually a Voronoi tesselation).
`describe`()	Get a descriptive string representation of this class.
`dimension`()	output dimension of clustering algorithm (always 1).
`estimate`(X, **kwargs)	Estimates the model given the data X
`fit`(X[, y])	Estimates parameters - for compatibility with sklearn.
`fit_predict`(X[, y])	Performs clustering on X and returns cluster labels.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_model_params`([deep])	Get parameters for this model.
`get_output`([dimensions, stride, skip, chunk])	Maps all input data of this transformer and returns it as an array or list of arrays
`get_params`([deep])	Get parameters for this estimator.
`iterator`([stride, lag, chunk, …])	creates an iterator to stream over the (transformed) data.
`load`(file_name[, model_name])	Loads a previously saved PyEMMA object from disk.
`n_chunks`(chunksize[, stride, skip])	how many chunks an iterator of this sourcde will output, starting (eg.
`n_frames_total`([stride, skip])	Returns total number of frames.
`number_of_trajectories`([stride])	Returns the number of trajectories.
`output_type`()	By default transformers return single precision floats.
`sample_indexes_by_cluster`(clusters, nsample)	Samples trajectory/time indexes according to the given sequence of states.
`save`(file_name[, model_name, overwrite, …])	saves the current state of this object to given file and name.
`save_dtrajs`([trajfiles, prefix, output_dir, …])	saves calculated discrete trajectories.
`set_model_params`(clustercenters)
`set_params`(**params)	Set the parameters of this estimator.
`trajectory_length`(itraj[, stride, skip])	Returns the length of trajectory of the requested index.
`trajectory_lengths`([stride, skip])	Returns the length of each trajectory.
`transform`(X)	Maps the input data through the transformer to correspondingly shaped output data array/list.
`update_model_params`(**params)	Update given model parameter if they are set to specific values
`write_to_csv`([filename, extension, …])	write all data to csv with numpy.savetxt
`write_to_hdf5`(filename[, group, …])	writes all data of this Iterable to a given HDF5 file.

Attributes

References

1: http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf