pyemma.coordinates.cluster_regspace

pyemma.coordinates.cluster_regspace(data=None, dmin=-1, max_centers=1000, stride=1, metric='euclidean')

Regular space clustering

If given data, it performs a regular space clustering [1] and returns a RegularSpaceClustering object that can be used to extract the discretized data sequences, or to assign other data points to the same partition. If data is not given, an empty RegularSpaceClustering will be created that still needs to be parametrized, e.g. in a pipeline().

Regular space clustering is very similar to Hartigan’s leader algorithm [2]. It consists of two passes through the data. Initially, the first data point is added to the list of centers. For every subsequent data point, if it has a greater distance than dmin from every center, it also becomes a center. In the second pass, a Voronoi discretization with the computed centers is used to partition the data.

Parameters:
  • data (ndarray (T, d) or list of ndarray (T_i, d) or a reader created by source function) – input data, if available in memory
  • dmin (float) – the minimal distance between cluster centers
  • max_centers (int (optional), default=1000) – If max_centers is reached, the algorithm will stop to find more centers, but it is possible that parts of the state space are not properly discretized. This will generate a warning. If that happens, it is suggested to increase dmin such that the number of centers stays below max_centers.
  • stride (int, optional, default = 1) – If set to 1, all input data will be used for estimation. Note that this could cause this calculation to be very slow for large data sets. Since molecular dynamics data is usually correlated at short timescales, it is often sufficient to estimate transformations at a longer stride. Note that the stride option in the get_output() function of the returned object is independent, so you can parametrize at a long stride, and still map all frames through the transformer.
  • metric (str) – metric to use during clustering (‘euclidean’, ‘minRMSD’)
Returns:

regSpace – Object for regular space clustering. It holds discrete trajectories and cluster center information.

Return type:

a RegularSpaceClustering clustering object

References

[1]Prinz J-H, Wu H, Sarich M, Keller B, Senne M, Held M, Chodera JD, Schuette Ch and Noe F. 2011. Markov models of molecular kinetics: Generation and Validation. J. Chem. Phys. 134, 174105.
[2]Hartigan J. Clustering algorithms. New York: Wiley; 1975.