ott.tools.k_means.k_means

Contents

ott.tools.k_means.k_means#

ott.tools.k_means.k_means(geom, k, weights=None, init='k-means++', n_init=10, n_local_trials=None, tol=0.0001, min_iterations=0, max_iterations=300, store_inner_errors=False, rng=None)[source]#

K-means clustering using Lloyd’s algorithm [Lloyd, 1982].

Parameters:
  • geom (Union[Array, PointCloud]) – Point cloud of shape [n, ndim] to cluster. If passed as an array, SqEuclidean cost is assumed.

  • k (int) – The number of clusters.

  • weights (Optional[Array]) – The weights of input points. These weights are considered when computing the centroids and inertia. If None, use uniform weights.

  • init (Union[Literal['k-means++', 'random'], Callable[[PointCloud, int, Array], Array]]) –

    Initialization method. Can be one of the following:

    • ’k-means++’ - select initial centroids that are \(\mathcal{O}(\log k)\)-optimal [Arthur and Vassilvitskii, 2007].

    • ’random’ - randomly select k points from the geom.

    • callable() - a function which takes the point cloud, the number of clusters and a random key and returns the centroids as an array of shape [k, ndim].

  • n_init (int) – Number of times k-means will run with different initial seeds.

  • n_local_trials (Optional[int]) – Number of local trials when init = 'k-means++'. If None, \(2 + \lfloor log(k) \rfloor\) is used.

  • tol (float) – Relative tolerance with respect to the Frobenius norm of the centroids’ shift between two consecutive iterations.

  • min_iterations (int) – Minimum number of iterations.

  • max_iterations (int) – Maximum number of iterations.

  • store_inner_errors (bool) – Whether to store the errors (inertia) at each iteration.

  • rng (Optional[Array]) – Random key for seeding the initializations.

Return type:

KMeansOutput

Returns:

The k-means clustering.