CVA file: an index structure for high-dimensional datasets

  • Authors:
  • Jiyuan An;Hanxiong Chen;Kazutaka Furuse;Nobuo Ohbo

  • Affiliations:
  • University of Tsukuba, Doctoral Program in Engineering, Ibaraki, Japan and Centre for Information Technology Innovation, Queensland University of Technology, 126 Margaret Street GPO Box 2434, Bris ...;University of Tsukuba, Institute of Information Sciences and Electronics, Ibaraki, Japan;University of Tsukuba, Institute of Information Sciences and Electronics, Ibaraki, Japan;University of Tsukuba, Institute of Information Sciences and Electronics, Ibaraki, Japan

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity search is important in information-retrieval applications where objects are usually represented as vectors of high dimensionality. This paper proposes a new dimensionality-reduction technique and an indexing mechanism for high-dimensional datasets. The proposed technique reduces the dimensions for which coordinates are less than a critical value with respect to each data vector. This flexible datawise dimensionality reduction contributes to improving indexing mechanisms for high-dimensional datasets that are in skewed distributions in all coordinates. To apply the proposed technique to information retrieval, a CVA file (compact VA file), which is a revised version of the VA file is developed. By using a CVA file, the size of index files is reduced further, while the tightness of the index bounds is held maximally. The effectiveness is confirmed by synthetic and real data.