Web-scale k-means clustering

Authors:
D. Sculley
Affiliations:
Google, Inc., Pittsburgh, PA, USA
Venue:
Proceedings of the 19th international conference on World wide web
Year:
2010

Citing 3
Cited 7

RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
The Top Ten Algorithms in Data Mining

The Top Ten Algorithms in Data Mining

Scalable k-means++

Proceedings of the VLDB Endowment
Scalable multi stage clustering of tagged micro-messages

Proceedings of the 21st international conference companion on World Wide Web
Supervised dictionary learning for music genre classification

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Evaluating the use of clustering for automatically organising digital library collections

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Scalable K-Means by ranked retrieval

Proceedings of the 7th ACM international conference on Web search and data mining
Activity representation with motion hierarchies

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml