Generalized Hamming Distance

Authors:
Abraham Bookstein;Vladimir A. Kulyukin;Timo Raita
Affiliations:
Center for Information and Language Studies, University of Chicago, Chicago, IL 60637. a-bookstein@uchicago.edu;Computer Science Department, Utah State University, Logan, Utah 84322-4205. vkulyukin@cs.usu.edu;Computer Science Department, University of Turku, 20520 Turku, Finland. raita@cs.utu.fi
Venue:
Information Retrieval
Year:
2002

Citing 17
Cited 7

Introduction to algorithms

Introduction to algorithms
Compression of correlated bit-vectors

Information Systems
Coding and information theory

Coding and information theory
Color indexing

International Journal of Computer Vision
Subtopic structuring for full-length document access

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Practical computer vision using C

Practical computer vision using C
Text algorithms

Text algorithms
Data mining

Data mining
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Clumping properties of content-bearing words

Journal of the American Society for Information Science
Real life information retrieval: a study of user queries on the Web

ACM SIGIR Forum
Semantic Road Maps for Literature Searchers

Journal of the ACM (JACM)
Structured natural-language descriptions for semantic content retrieval of visual materials

Journal of the American Society for Information Science and Technology
Fuzzy Hamming Distance: A New Dissimilarity Measure

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Mining free text for structure

Data mining
Multi-paragraph segmentation of expository text

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Coding and Information Theory

Coding and Information Theory

ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata

Geoinformatica
Human-Robot Interaction Through Gesture-Free Spoken Dialogue

Autonomous Robots
Computational geometric aspects of rhythm, melody, and voice-leading

Computational Geometry: Theory and Applications
Efficient k-nearest neighbor searching in nonordered discrete data spaces

ACM Transactions on Information Systems (TOIS)
Event log mining tool for large scale HPC systems

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
The geometry of musical rhythm

JCDCG'04 Proceedings of the 2004 Japanese conference on Discrete and Computational Geometry
Abnormal behaviours identification for an elder's life activities using dissimilarity measurements

Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a “Generalized Hamming distance” that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the “Generalized Hamming distance”, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measure's effectiveness in relating similar bitstrings.