Hashing, searching, sketching

  • Authors:
  • Rajeev Motwani;Rina Panigrahy

  • Affiliations:
  • Stanford University;Stanford University

  • Venue:
  • Hashing, searching, sketching
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Information Age has enabled the search for information in ways never imagined before. The simplest search function may be an exact search where the input query is expected to exactly match the search object. But some search criteria are fuzzy---for instance image search, news search, and similar document search---making the search problem much harder. One common approach is to convert such a search object into a mathematical representation such as a point (vector) in a high dimensional space. The search for a similar object then becomes a nearest neighbor search in a high dimensional space. Hashing is a simple and effective method for exact search that uses a random hash function to map items into buckets, often viewed as throwing balls into bins. A variant of hashing called locality-sensitive hashing that tends to map similar objects to the same hash bucket, can be used to perform nearest neighbor search. A related notion is sketching that is used to transform a large complex object into a small sketch---often a tiny bitmap---so that similarity between the sketches can be used to estimate the similarity between the original objects. In this thesis we study algorithms for different kinds of search using hashing and sketching, and some fundamental limits of what can be realized using some of these approaches. For exact search, we will see how variants of balls-and-bins processes can be used to derive space efficient methods for maintaining hash tables. For similarity search, we will see a variant of locality-sensitive hashing that uses linear space and how the underlying ideas can be used in the kd-tree data structure for improved performance. We will also probe the fundamental limits of some of these approaches by showing lower bounds on their performance.