Pivot selection method for optimizing both pruning and balancing in metric space indexes

Authors:
Hisashi Kurasawa;Daiji Fukagawa;Atsuhiro Takasu;Jun Adachi
Affiliations:
The University of Tokyo, Chiyoda-ku, Tokyo, Japan;Doshisha University, Kyotanabe-shi, Kyoto, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan
Venue:
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Year:
2010

Citing 14
Cited 0

Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
Information and Coding Theory

Information and Coding Theory
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching

Multimedia Tools and Applications
Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Searching in metric spaces by spatial approximation

The VLDB Journal — The International Journal on Very Large Data Bases
D-Index: Distance Searching Index for Metric Data Sets

Multimedia Tools and Applications
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Scene completion using millions of photographs

ACM SIGGRAPH 2007 papers
The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

The VLDB Journal — The International Journal on Very Large Data Bases
Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Maximal metric margin partitioning for similarity search indexes

Proceedings of the 18th ACM conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We researched to try to find a way to reduce the cost of nearest neighbor searches in metric spaces. Many similarity search indexes recursively divide a region into subregions by using pivots, and construct a tree structure index. A problem in the existing indexes is that they only focus on the pruning objects and do not take into consideration the tree balancing. The balance of the indexes depends on the data distribution and the indexes don't reduce the search cost for all data. We propose a similarity search index called the Partitioning Capacity Tree (PCTree). PCTree automatically optimizes the pivot selection based on both the balance of the regions partitioned by a pivot and the estimated effectiveness of the search pruning by the pivot. As a result, PCTree reduces the search cost for various data distributions. Our evaluations comparing it with four indexes on three real datasets showed that PCTree successfully reduces the search cost and is good at handling various data distributions.