Quantization techniques for similarity search in high-dimensional data spaces

  • Authors:
  • Christian Garcia-Arellano;Ken Sevcik

  • Affiliations:
  • Department of Computer Science, University of Toronto, Canada and IBM Toronto Lab, Toronto, Canada;Department of Computer Science, University of Toronto, Canada

  • Venue:
  • BNCOD'03 Proceedings of the 20th British national conference on Databases
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the recent years, several techniques have been developed for efficient similarity search in high-dimensional data spaces. Some of the techniques, based on the idea of vector approximation via quantization, have been shown to be the most effective. The VA-file was the first technique to use vector approximation. The IQ-tree and the A-tree are subsequent techniques that impose a directory structure over the quantized VA-file representation. The performance gains of the IQ-tree result mainly from an optimized I/O strategy permitted by the directory structure. Those of the A-tree result mainly from exploiting the clustering of the data itself. In our work, first we evaluate the relative performance of these two enhanced approaches over high-dimensional data sets with different clustering characteristics. Second, we present the Clustered IQ-Tree, which is an indexing strategy that combines the best features of the IQ-tree and the A-tree, leading to better query performance than the former and more stable performance than the latter across different types of data sets.