Searching historical manuscripts for near-duplicate figures

  • Authors:
  • Thanawin Rakthanmanon;Qiang Zhu;Eamonn J. Keogh

  • Affiliations:
  • University of California, Riverside;University of California, Riverside;University of California, Riverside

  • Venue:
  • Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the next decade a majority of all the books ever published will be digitized and online. Naturally, most of the data in historical manuscripts is text, but there is also a large amount devoted to images. This observation is responsible for the dramatic increase in interest in query-by-content systems for historical documents. While querying/indexing systems can be useful, we believe that this domain is finally ready for unsupervised discovery of patterns. With this in mind, we introduce an efficient and scalable technique that can detect approximately repeated occurrences of images both within and between historical texts. We demonstrate that this ability to find repeated shapes allows us to do automatic annotation of manuscripts. We show the utility of our technique on datasets dating back to the fourteenth century.