# ott.tools.k_means.k_means#

ott.tools.k_means.k_means(geom, k, weights=None, init='k-means++', n_init=10, n_local_trials=None, tol=0.0001, min_iterations=0, max_iterations=300, store_inner_errors=False, rng=None)[source]#

K-means clustering using Lloyd’s algorithm .

Parameters:
• geom () – Point cloud of shape [n, ndim] to cluster. If passed as an array, SqEuclidean cost is assumed.

• k (int) – The number of clusters.

• weights () – The weights of input points. These weights are considered when computing the centroids and inertia. If None, use uniform weights.

• init (Union[Literal['k-means++', 'random'], Callable[[PointCloud, int, Array], Array]]) –

Initialization method. Can be one of the following:

• ’k-means++’ - select initial centroids that are $$\mathcal{O}(\log k)$$-optimal .

• ’random’ - randomly select k points from the geom.

• callable() - a function which takes the point cloud, the number of clusters and a random key and returns the centroids as an array of shape [k, ndim].

• n_init (int) – Number of times k-means will run with different initial seeds.

• n_local_trials () – Number of local trials when init = 'k-means++'. If None, $$2 + \lfloor log(k) \rfloor$$ is used.

• tol (float) – Relative tolerance with respect to the Frobenius norm of the centroids’ shift between two consecutive iterations.

• min_iterations (int) – Minimum number of iterations.

• max_iterations (int) – Maximum number of iterations.

• store_inner_errors (bool) – Whether to store the errors (inertia) at each iteration.

• rng () – Random key for seeding the initializations.

Return type:

KMeansOutput

Returns:

The k-means clustering.