Large-Scale Parallel Data Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Recognition from Local Scale-Invariant Features
ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Exploring self-similarities of bag-of-features for image classification
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Million-scale near-duplicate video retrieval system
MM '11 Proceedings of the 19th ACM international conference on Multimedia
A unified context model for web image retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Spatial pooling of heterogeneous features for image applications
Proceedings of the 20th ACM international conference on Multimedia
Approximate gaussian mixtures for large scale vocabularies
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Hi-index | 0.00 |
The web-scale image retrieval system demands a large-scale visual codebook, which is difficult to be generated by the commonly adopted K-means vector quantization due to the applicability issue. While approximate K-means is proposed to scale up the visual codebook construction it needs to employ a high-precision approximate nearest neighbor search in the assignment step and is difficult to converge, which limits its scalability. In this paper, we propose an improved approximate K-means, by leveraging the assignment information in the history, namely the previous iterations, to improve the assignment precision. By further randomizing the employed approximate nearest neighbor search in each iteration, the proposed algorithm can improve the assignment precision conceptually similarly as the randomized k-d trees, while nearly no additional cost is introduced. The algorithm can be proved to be convergent and we demonstrate that the proposed algorithm improves the quality of the generated visual codebook as well as the scalability experimentally and analytically.