Features for Word Spotting in Historical Manuscripts

  • Authors:
  • Toni M. Rath;R. Manmatha

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the transition from traditional to digital libraries, thelarge number of handwritten manuscripts that exist pose agreat challenge. Easy access to such collections requiresan index, which is currently created manually at great cost.Because automatic handwriting recognizers fail on historicalmanuscripts, the word spotting technique has been developed:the words in a collection are matched as imagesand grouped into clusters which contain all instances of thesame word. By annotating "interesting" clusters, an indexthat links words to the locations where they occur can bebuilt automatically.Due to the noise in historical documents, selecting theright features for matching words is crucial. We analyzeda range of features suitable for matching words using dynamictime warping (DTW), which aligns and comparessets of features extracted from two images. Each feature'sindividual performance was measured on a test set. With anaverage precision of 72%, a combination of features outperformscompeting techniques in speed and precision.