RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
The Top Ten Algorithms in Data Mining
The Top Ten Algorithms in Data Mining
Proceedings of the VLDB Endowment
Scalable multi stage clustering of tagged micro-messages
Proceedings of the 21st international conference companion on World Wide Web
Supervised dictionary learning for music genre classification
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Evaluating the use of clustering for automatically organising digital library collections
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Better cross company defect prediction
Proceedings of the 10th Working Conference on Mining Software Repositories
Scalable K-Means by ranked retrieval
Proceedings of the 7th ACM international conference on Web search and data mining
Activity representation with motion hierarchies
International Journal of Computer Vision
Hi-index | 0.00 |
We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml