pyemma.coordinates.cluster_mini_batch_kmeans

pyemma.coordinates.cluster_mini_batch_kmeans(data=None, k=100, max_iter=10, batch_size=0.2, metric='euclidean', init_strategy='kmeans++', n_jobs=None, chunksize=None, skip=0, clustercenters=None, **kwargs)

k-means clustering with mini-batch strategy

Mini-batch k-means is an approximation to k-means which picks a randomly selected subset of data points to be updated in each iteration. Usually much faster than k-means but will likely deliver a less optimal result.

Returns

kmeans_mini – Object for mini-batch kmeans clustering. It holds discrete trajectories and cluster center information.

Return type

a MiniBatchKmeansClustering clustering object

See also

kmeans : for full k-means clustering

class pyemma.coordinates.clustering.kmeans.MiniBatchKmeansClustering(*args, **kwargs)

Mini-batch k-means clustering

Methods

assign([X, stride])

Assigns the given trajectory or list of trajectories to cluster centers by using the discretization defined by this clustering method (usually a Voronoi tesselation).

describe()

Get a descriptive string representation of this class.

dimension()

output dimension of clustering algorithm (always 1).

estimate(X, **kwargs)

Estimates the model given the data X

fit(X[, y])

Estimates parameters - for compatibility with sklearn.

fit_predict(X[, y])

Performs clustering on X and returns cluster labels.

fit_transform(X[, y])

Fit to data, then transform it.

get_model_params([deep])

Get parameters for this model.

get_output([dimensions, stride, skip, chunk])

Maps all input data of this transformer and returns it as an array or list of arrays

get_params([deep])

Get parameters for this estimator.

iterator([stride, lag, chunk, …])

creates an iterator to stream over the (transformed) data.

load(file_name[, model_name])

Loads a previously saved PyEMMA object from disk.

n_chunks(chunksize[, stride, skip])

how many chunks an iterator of this sourcde will output, starting (eg.

n_frames_total([stride, skip])

Returns total number of frames.

number_of_trajectories([stride])

Returns the number of trajectories.

output_type()

By default transformers return single precision floats.

sample_indexes_by_cluster(clusters, nsample)

Samples trajectory/time indexes according to the given sequence of states.

save(file_name[, model_name, overwrite, …])

saves the current state of this object to given file and name.

save_dtrajs([trajfiles, prefix, output_dir, …])

saves calculated discrete trajectories.

set_model_params(clustercenters)

set_params(**params)

Set the parameters of this estimator.

trajectory_length(itraj[, stride, skip])

Returns the length of trajectory of the requested index.

trajectory_lengths([stride, skip])

Returns the length of each trajectory.

transform(X)

Maps the input data through the transformer to correspondingly shaped output data array/list.

update_model_params(**params)

Update given model parameter if they are set to specific values

write_to_csv([filename, extension, …])

write all data to csv with numpy.savetxt

write_to_hdf5(filename[, group, …])

writes all data of this Iterable to a given HDF5 file.

Attributes

References

1

http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf