Searching historical manuscripts for near-duplicate figures

Authors:
Thanawin Rakthanmanon;Qiang Zhu;Eamonn J. Keogh
Affiliations:
University of California, Riverside;University of California, Riverside;University of California, Riverside
Venue:
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Year:
2011

Citing 9
Cited 0

Use of the Hough transformation to detect lines and curves in pictures

Communications of the ACM
Finding motifs using random projections

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Geometric Hashing: An Overview

IEEE Computational Science & Engineering
Adaptive Binarization of Historical Document Images

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 03
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures

The VLDB Journal — The International Journal on Very Large Data Bases
Augmenting the generalized hough transform to enable the mining of petroglyphs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding Time Series Motifs in Disk-Resident Data

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Learning Context-Sensitive Shape Similarity by Graph Transduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
An incremental parser to recognize diagram symbols and gestures represented by adjacency grammars

GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the next decade a majority of all the books ever published will be digitized and online. Naturally, most of the data in historical manuscripts is text, but there is also a large amount devoted to images. This observation is responsible for the dramatic increase in interest in query-by-content systems for historical documents. While querying/indexing systems can be useful, we believe that this domain is finally ready for unsupervised discovery of patterns. With this in mind, we introduce an efficient and scalable technique that can detect approximately repeated occurrences of images both within and between historical texts. We demonstrate that this ability to find repeated shapes allows us to do automatic annotation of manuscripts. We show the utility of our technique on datasets dating back to the fourteenth century.