Large-scale similarity data management with distributed Metric Index

Authors:
David Novak;Michal Batko;Pavel Zezula
Affiliations:
Masaryk University, Brno, Czech Republic;Masaryk University, Brno, Czech Republic;Masaryk University, Brno, Czech Republic
Venue:
Information Processing and Management: an International Journal
Year:
2012

Citing 25
Cited 3

Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
P-Grid: A Self-Organizing Access Structure for P2P Information Systems

CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
Querying peer-to-peer networks using P-trees

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
M-Chord: a scalable distributed similarity search structure

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
On scalability of the similarity search in the world of peers

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Multi-probe LSH: efficient indexing for high-dimensional similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Peer-to-peer similarity search in metric spaces

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Scalability comparison of Peer-to-Peer similarity search structures

Future Generation Computer Systems
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Counting distance permutations

Journal of Discrete Algorithms
Generic similarity search engine demonstrated by an image retrieval application

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
CoPhIR Image Collection under the Microscope

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Building a web-scale image similarity search system

Multimedia Tools and Applications
MESSIF: metric similarity search implementation framework

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
A content-addressable network for similarity search in metric spaces

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
On locality-sensitive indexing in generic metric spaces

Proceedings of the Third International Conference on SImilarity Search and APplications
Metric Index: An efficient and scalable solution for precise and approximate similarity search

Information Systems
Similarity grid for searching in metric spaces

DELOS'04 Proceedings of the 6th Thematic conference on Peer-to-Peer, Grid, and Service-Orientation in Digital Library Architectures

Approximate distributed metric-space search

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval
Metric-Based similarity search in unstructured peer-to-peer systems

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Modelling efficient novelty-based search result diversification in metric spaces

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application.