Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

  • Authors:
  • Changzhou Wang;X. Sean Wang

  • Affiliations:
  • Mathematics and Computing Technology, Phantom Works, The Boeing Company, Bellevue, Washington, USA/ E-mail: changzhou.wang@boeing.com;Department of Information and Software Engineering, George Mason University, Fairfax, Virginia, USA/ E-mail: xywang@gmu.edu

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity queries on complex objects are usually translated into searches among their feature vectors. This paper studies indexing techniques for very high-dimensional (e.g., in hundreds) vectors that are sparse or quasi-sparse, i.e., vectors each having only a small number (e.g., ten) of non-zero or significant values. Based on the R-tree, the paper introduces the xS-tree that uses lossy compression of bounding regions to guarantee a reasonable minimum fan-out within the allocated storage space for each node. In addition, the paper studies the performance and scalability of the xS-tree via experiments.