CVA file: an index structure for high-dimensional datasets

Authors:
Jiyuan An;Hanxiong Chen;Kazutaka Furuse;Nobuo Ohbo
Affiliations:
University of Tsukuba, Doctoral Program in Engineering, Ibaraki, Japan and Centre for Information Technology Innovation, Queensland University of Technology, 126 Margaret Street GPO Box 2434, Bris ...;University of Tsukuba, Institute of Information Sciences and Electronics, Ibaraki, Japan;University of Tsukuba, Institute of Information Sciences and Electronics, Ibaraki, Japan;University of Tsukuba, Institute of Information Sciences and Electronics, Ibaraki, Japan
Venue:
Knowledge and Information Systems
Year:
2005

Citing 15
Cited 5

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Modern Information Retrieval

Modern Information Retrieval
The convex polyhedra technique: an index structure for high-dimensional space

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
C2VA: Trim High Dimensional Indexes

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management

Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

Knowledge and Information Systems
Efficient histogram-based similarity search in ultra-high dimensional space

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
A new indexing method for high dimensional dataset

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Approximate high-dimensional nearest neighbor queries using R-forests

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data.