Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The convex polyhedra technique: an index structure for high-dimensional space
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
CVA file: an index structure for high-dimensional datasets
Knowledge and Information Systems
DDR: an index method for large time-series datasets
Information Systems
Efficient high-dimensional indexing by sorting principal component
Pattern Recognition Letters
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Hi-index | 0.00 |
Classical multi-dimensional indexes are based on data space partitioning. The effectiveness declines because the number of indexing units grows exponentially as the number of dimensions increases. Then, unfortunately, using such index structures is less effective than linear scanning of all the data. The VA-file proposed a method of coordinate approximation, observing that nearest neighbor search becomes of linear complexity in high-dimensional spaces.In this paper we propose CM2VA(Clustered Compact VA) for dimensionality reduction. We investigate and find that real datasets are rarely uniformly distributed, which is the main assumption of VA-file. Instead of approximation on all dimensions, we figure out the condition of skipping less important dimensions. This avoids the problem of generating huge index file for a large, high dimensional dataset and hence saves a lot of I/O accesses when scanning. Moreover, we guarantee that C2VA preserves the precision of bounds as in VA-file, which maximizes the efficiency gain. The conviction is found in our experimental results.