<ul><li>This thesis introduces optimized algorithms for k-means++ initialization and Lloyd's algorithm, leveraging sorted data, prefix sums, and binary search for improved computational performance.</li><li>The main contributions are: (1) an optimized k-cluster algorithm achieving O(l * k^2 * log n) complexity for greedy k-means++ initialization and O(i * k * log n) for Lloyd's algorithm, and (2) a binary search-based two-cluster algorithm, achieving O(log n) runtime with deterministic convergence to a Lloyd's algorithm local minimum.</li><li>Benchmarks demonstrate over 4500x speedup compared to scikit-learn for large datasets while maintaining clustering quality measured by within-cluster sum of squares (WCSS).</li><li>The algorithms also achieve a 300x speedup in an LLM quantization task, highlighting their utility in emerging applications.</li></ul>

Log-Time K-Means Clustering for 1D Data: Novel Approaches with Proof and Implementation

Discover more