C2VA: Trim High Dimensional Indexes

Authors:
Hanxiong Chen;Jiyuan An;Kazutaka Furuse;Nobuo Ohbo
Affiliations:
-;-;-;-
Venue:
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Year:
2002

Citing 8
Cited 4

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The convex polyhedra technique: an index structure for high-dimensional space

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases

CVA file: an index structure for high-dimensional datasets

Knowledge and Information Systems
DDR: an index method for large time-series datasets

Information Systems
Efficient high-dimensional indexing by sorting principal component

Pattern Recognition Letters
Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classical multi-dimensional indexes are based on data space partitioning. The effectiveness declines because the number of indexing units grows exponentially as the number of dimensions increases. Then, unfortunately, using such index structures is less effective than linear scanning of all the data. The VA-file proposed a method of coordinate approximation, observing that nearest neighbor search becomes of linear complexity in high-dimensional spaces.In this paper we propose CM2VA(Clustered Compact VA) for dimensionality reduction. We investigate and find that real datasets are rarely uniformly distributed, which is the main assumption of VA-file. Instead of approximation on all dimensions, we figure out the condition of skipping less important dimensions. This avoids the problem of generating huge index file for a large, high dimensional dataset and hence saves a lot of I/O accesses when scanning. Moreover, we guarantee that C2VA preserves the precision of bounds as in VA-file, which maximizes the efficiency gain. The conviction is found in our experimental results.