Indexing for vector projections

  • Authors:
  • Sean Chester;Alex Thomo;S. Venkatesh;Sue Whitesides

  • Affiliations:
  • University of Victoria, STN CSC, Victoria, BC, Canada;University of Victoria, STN CSC, Victoria, BC, Canada;University of Victoria, STN CSC, Victoria, BC, Canada;University of Victoria, STN CSC, Victoria, BC, Canada

  • Venue:
  • DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ability to extract the most relevant information from a dataset is paramount when the dataset is large. For data arising from a numeric domain, a pervasive means of modelling the data is to represent it in the form of vectors. This enables a range of geometric techniques; this paper introduces projection as a natural and powerful means of scoring the relevancy of vectors. As yet, there are no effective indexing techniques for quickly retrieving those vectors in a dataset that have large projections onto a query vector. We address that gap by introducing the first indexing algorithms for vectors of arbitrary dimension, producing indices with strong sub-linear and output-sensitive worst-case query cost and linear data structure size guarantees in the I/O cost model. We improve this query cost markedly for the special case of two dimensions. The derivation of these algorithms results from the novel geometric insight that is presented in this paper, the concept of a data vector's cap.