Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

Authors:
Yi Zhuang;Yueting Zhuang;Qing Li;Lei Chen;Yi Yu
Affiliations:
Zhejiang University, P.R.China;Zhejiang University, P.R.China;City University of Hong Kong, HKSAR, P.R.China;Hong Kong University of Science and Technology, HKSAR, P.R.China;Nara Women's University, Japan
Venue:
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Year:
2008

Citing 18
Cited 4

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching

Communications of the ACM
Searching in metric spaces

ACM Computing Surveys (CSUR)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Similarity Search without Tears: The OMNI Family of All-purpose Access Methods

Proceedings of the 17th International Conference on Data Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)

Maximal metric margin partitioning for similarity search indexes

Proceedings of the 18th ACM conference on Information and knowledge management
Pivot selection method for optimizing both pruning and balancing in metric space indexes

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Finding the k-closest pairs in metric spaces

Proceedings of the 1st Workshop on New Trends in Similarity Search
Time-HOBI: Index for optimizing star queries

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the well-known dimensionality curse problem, search in a high-dimensional space is considered as a "hard" problem. In this paper, a novel symmetrical encoding-based index structure, which is called EHD-Tree (for symmetrical Encoding-based Hybrid Distance Tree), is proposed to support fast k-Nearest-Neighbor (k-NN) search in high-dimensional spaces. In an EHD-Tree, all data points are first grouped into clusters by a k-Means clustering algorithm. Then the uniform ID number of each data point is obtained by a dual-distance-driven encoding scheme in which each cluster sphere is partitioned twice according to the dual distances of start- and centroid-distance. Finally, the uniform ID number and the centroid-distance of each data point are combined to get a uniform index key, the latter is then indexed through a partition-based B+-tree. Thus, given a query point, its k-NN search in high-dimensional spaces can be transformed into search in a single dimensional space with the aid of the EHD-Tree index. Extensive performance studies are conducted to evaluate the effectiveness and efficiency of our proposed scheme, and the results demonstrate that this method outperforms the state-of-the-art high dimensional search techniques such as the X-Tree, VA-file, iDistance and NB-Tree, especially when the query radius is not very large.