Succinct nearest neighbor search

  • Authors:
  • Eric Sadit Tellez;Edgar Chávez;Gonzalo Navarro

  • Affiliations:
  • Universidad Michoacana de San Nicolás de Hidalgo, México;Universidad Michoacana de San Nicolás de Hidalgo, México;University of Chile, Chile

  • Venue:
  • Proceedings of the Fourth International Conference on SImilarity Search and APplications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present a novel technique for nearest neighbor searching dubbed neighborhood approximation. The central idea is to divide the database into compact regions represented by a single object, called the reference. To search for nearest neighbors a set of candidate references is first obtained and later enriched with the database objects associated to those references. This approach can be implemented with an inverted index, which in turn can be represented in a succinct way, spending just a few bits per object. As a consequence it is possible to store the index in main memory, even for relatively large databases. The speed/compression/recall tradeoff achieved is excellent. To obtain 92% recall in 30-nearest neighbors searches the index reviews less than 0.6% of the database, in time ranging from 0.35 to 2.67 seconds using from 93 to 24 Mbytes for a ten million objects database. The tradeoff comes from using different compression techniques. The uncompressed index requires 0.17 seconds and 267 Mbytes of space. A quality measure complementary to the recall is the ratio between the covering radius of the actual nearest neighbors and the near neighbors reported by the algorithm. Using this measure our results are within a small constant compared to the exact results.