Use of permutation prefixes for efficient and scalable approximate similarity search

Authors:
Andrea Esuli
Affiliations:
Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, via Giuseppe Moruzzi, 1, 56124 Pisa, Italy
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 17
Cited 0

Similarity-based queries

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Searching in metric spaces

ACM Computing Surveys (CSUR)
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Combining Metric Features in Large Collections

SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Approximate similarity search: A multi-faceted problem

Journal of Discrete Algorithms
Counting distance permutations

Journal of Discrete Algorithms
Metric Index: An Efficient and Scalable Solution for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
On locality-sensitive indexing in generic metric spaces

Proceedings of the Third International Conference on SImilarity Search and APplications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the Permutation Prefix Index (this work is a revised and extended version of Esuli (2009b), presented at the 2009 LSDS-IR Workshop, held in Boston) (PP-Index), an index data structure that supports efficient approximate similarity search. The PP-Index belongs to the family of the permutation-based indexes, which are based on representing any indexed object with ''its view of the surrounding world'', i.e., a list of the elements of a set of reference objects sorted by their distance order with respect to the indexed object. In its basic formulation, the PP-Index is strongly biased toward efficiency. We show how the effectiveness can easily reach optimal levels just by adopting two ''boosting'' strategies: multiple index search and multiple query search, which both have nice parallelization properties. We study both the efficiency and the effectiveness properties of the PP-Index, experimenting with collections of sizes up to one hundred million objects, represented in a very high-dimensional similarity space.