Generalized Hamming Distance

  • Authors:
  • Abraham Bookstein;Vladimir A. Kulyukin;Timo Raita

  • Affiliations:
  • Center for Information and Language Studies, University of Chicago, Chicago, IL 60637. a-bookstein@uchicago.edu;Computer Science Department, Utah State University, Logan, Utah 84322-4205. vkulyukin@cs.usu.edu;Computer Science Department, University of Turku, 20520 Turku, Finland. raita@cs.utu.fi

  • Venue:
  • Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a “Generalized Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the “Generalized Hamming distance”, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measure's effectiveness in relating similar bitstrings.