Word spotting for historical documents

  • Authors:
  • Tony M. Rath;R. Manmatha

  • Affiliations:
  • University of Massachusetts Amherst, Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval, Department of Computer Science, 01003, Amherst, MA, USA;University of Massachusetts Amherst, Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval, Department of Computer Science, 01003, Amherst, MA, USA

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Searching and indexing historical handwritten collections are a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering using both K-means and agglomerative clustering techniques. It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.