Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Finding near neighbors through cluster pruning
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed
International Journal of High Performance Computing Applications
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A posteriori multi-probe locality sensitive hashing
MM '08 Proceedings of the 16th ACM international conference on Multimedia
IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation of GIST descriptors for web-scale image search
Proceedings of the ACM International Conference on Image and Video Retrieval
Building a web-scale image similarity search system
Multimedia Tools and Applications
Locality sensitive hashing: A comparison of hash function types and querying mechanisms
Pattern Recognition Letters
A large-scale performance study of cluster-based high-dimensional indexing
Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Product Quantization for Nearest Neighbor Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
NV-Tree: nearest neighbors at the billion scale
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
ImageTerrier: an extensible platform for scalable high-performance image retrieval
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Hi-index | 0.00 |
Most researchers working on high-dimensional indexing agree on the following three trends: (i) the size of the multimedia collections to index are now reaching millions if not billions of items, (ii) the computers we use every day now come with multiple cores and (iii) hardware becomes more available, thanks to easier access to Grids and/or Clouds. This paper shows how the Map-Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using Hadoop, a popular Map-Reduce-based framework. Dramatic performance improvements are not however guaranteed a priori: such frameworks are rigid, they severely constrain the possible access patterns to data and scares resource RAM has to be shared. Furthermore, algorithms require major redesign, and may have to settle for sub-optimal behavior. The benefits, however, are many: simplicity for programmers, automatic distribution, fault tolerance, failure detection and automatic re-runs and, last but not least, scalability. We share our experience of adapting a clustering-based high-dimensional indexing algorithm to the Map-Reduce model, and of testing it at large scale with Hadoop as we index 30 billion SIFT descriptors. We foresee that lessons drawn from our work could minimize time, effort and energy invested by other researchers and practitioners working in similar directions.