Maximal metric margin partitioning for similarity search indexes

Authors:
Hisashi Kurasawa;Daiji Fukagawa;Atsuhiro Takasu;Jun Adachi
Affiliations:
The University of Tokyo, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 9
Cited 2

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
D-Index: Distance Searching Index for Metric Data Sets

Multimedia Tools and Applications
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Reference-based indexing of sequence databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology

Pivot selection method for optimizing both pruning and balancing in metric space indexes

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Finding the k-closest pairs in metric spaces

Proceedings of the 1st Workshop on New Trends in Similarity Search

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a partitioning scheme for similarity search indexes that is called Maximal Metric Margin Partitioning (MMMP). MMMP divides the data on the basis of its distribution pattern, especially for the boundaries of clusters. A partitioning surface created by MMMP is likely to be at maximum distances from the two cluster boundaries. MMMP is the first similarity search index approach to focus on partitioning surfaces and data distribution patterns. We also present an indexing scheme, named the MMMP-Index, which uses MMMP and small ball partitioning. The MMMP-Index prunes many objects that are not relevant to a query, and it reduces the query execution cost. Our experimental results show that MMMP effectively indexes clustered data and reduces the search cost. For clustered vector data, the MMMP-Index reduces the computational cost to less than two thirds that of comparable schemes.