Metric Index: An efficient and scalable solution for precise and approximate similarity search

Authors:
David Novak;Michal Batko;Pavel Zezula
Affiliations:
Masaryk University, Brno, Czech Republic;Masaryk University, Brno, Czech Republic;Masaryk University, Brno, Czech Republic
Venue:
Information Systems
Year:
2011

Citing 25
Cited 12

Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Skip graphs

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Approximate similarity retrieval with M-trees

The VLDB Journal — The International Journal on Very Large Data Bases
Searching in metric spaces by spatial approximation

The VLDB Journal — The International Journal on Very Large Data Bases
D-Index: Distance Searching Index for Metric Data Sets

Multimedia Tools and Applications
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters
Comparing Top k Lists

SIAM Journal on Discrete Mathematics
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
M-Chord: a scalable distributed similarity search structure

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Dynamic spatial approximation trees

Journal of Experimental Algorithmics (JEA)
The Many Facets of Approximate Similarity Search

SISAP '08 Proceedings of the First International Workshop on Similarity Search and Applications (sisap 2008)
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate similarity search in metric spaces using inverted files

Proceedings of the 3rd international conference on Scalable information systems
Counting distance permutations

Journal of Discrete Algorithms
CoPhIR Image Collection under the Microscope

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Metric Index: An Efficient and Scalable Solution for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Building a web-scale image similarity search system

Multimedia Tools and Applications
MESSIF: metric similarity search implementation framework

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development

Stabilizing the recall in similarity search

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Multi feature indexing network MUFIN for similarity search applications

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Inverted file-based indexing for efficient multimedia information retrieval in metric spaces

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Load Balancing Query Processing in Metric-Space Similarity Search

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Large-scale similarity data management with distributed Metric Index

Information Processing and Management: an International Journal
Efficient similarity search in metric spaces with cluster reduction

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Cut-Region: a compact building block for hierarchical metric indexing

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Parallel approaches to permutation-based indexing using inverted files

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Visual image search: feature signatures or/and global descriptors

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Query language for complex similarity queries

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Content-based annotation and classification framework: a general multi-purpose approach

Proceedings of the 17th International Database Engineering & Applications Symposium
Efficiency and security in similarity cloud services

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in established structures such as the B^+-tree or even in a distributed storage. We implemented the M-Index with the B^+-tree and performed experiments on two datasets-the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-Index variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-Index demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search.