Symbolic Computation Group

David R. Cheriton School of Computer Science
University of Waterloo, Waterloo, Ontario, Canada

Monday, March 3, 2025, at the University of Waterloo
Connections between k-coloring and k-means
Enver Aman, New Jersey

Abstract:

Given a finite set P of points in Euclidean space R^d and a parameter k > 0, the goal of the k-means problem is to identify k "center points" in R^d so that the sum of the square distance between each point in P and its closest center point is as small as possible. The k-means problem is part of a general class of clustering problems, which are widely studied in data science and machine learning.

It is well-known that k-means is NP-hard via reductions from structured 3-SAT instances [Dasgupta-Freund 2018]. Furthermore, a recent algorithm [Björkland et al. 2022] solves k-means in time O(2^n). A natural question to ask is whether k-means has an exact algorithm in faster than 2^n time. (Say, k-means in 1.99^n time for each k?)

We first present a simpler proof of NP-hardness of the k-means problem by a reduction from graph k-coloring. Similarly, we prove NP-hardness of 2-means by a reduction to a structured max-cut problem. Then, we present a faster-than-2^n algorithm for 2-means by reduction to a weighted triangle problem (inspired by a max-cut reduction to weighted triangle by [Williams 2005]). The runtime of this algorithm, with the current matrix multiplication bounds, runs in 1.72^n time.

This is joint work with Karthik C. S. and Sharath Punna.

Symbolic Computation Group

David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada

David R. Cheriton School of Computer Science
University of Waterloo, Waterloo, Ontario, Canada