Combining Approximation Techniques and Vector Quantization for Adaptable Similarity Search

  • Authors:
  • Christian Böhm;Hans-Peter Kriegel;Thomas Seidl

  • Affiliations:
  • University for Health Informatics and Technology Tyrol, Innrain 98, 6020 Innsbruck, Austria. Christian.Eoehm@umit.at;University of Munich, Institute for Computer Science, Oettingenstr. 67, 80538 München, Germany. kriegel@dbs.informatik.uni-muenchen.de;University of Constance, Department of Computer and Information Science, Box D78, 78457 Konstanz, Germany. seidl@informatik.uni-konstanz.de

  • Venue:
  • Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Adaptable similarity queries based on quadratic form distance functions are widely popular in data mining application domains including multimedia, CAD, molecular biology or medical image databases. Recently it has been recognized that quantization of feature vectors can substantially improve query processing for Euclidean distance functions, as demonstrated by the scan-based VA-file and the index structure IQ-tree. In this paper, we address the problem that determining quadratic form distances between quantized vectors is difficult and computationally expensive. Our solution provides a variety of new approximation techniques for quantized vectors which are combined by an extended multistep query processing architecture. In our analysis section, we show that the filter steps complement each other. Consequently, it is useful to apply our filters in combination. We show the superiority of our approach over other architectures and over competitive query processing methods. In our experimental evaluation, the sequential scan is outperformed by a factor of 2.3. Compared to the X-tree on 64 dimensional color histogram data, we measured an improvement factor of 5.7.