Fusion of Word Spotting and Spatial Information for Figure Caption Retrieval in Historical Document Images

Authors:
Khurram Khurshid;Claudie Faure;Nicole Vincent
Affiliations:
-;-;-
Venue:
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Year:
2009

Citing 0
Cited 2

A keyword spotting approach using blurred shape model-based descriptors

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Word spotting in historical printed documents using shape and sequence comparisons

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for figure caption detection by employing a fusion of several information sources. The evaluation is performed on documents gathered from the collection of the historical medical digital library Medic@. A method based on perceptual grouping simultaneously segments the vertical and horizontal text lines in a page. Spatial relationships between the text lines and the graphics are considered to select a set of caption line candidates. A feature-based wordspotting method is proposed to retrieve the occurrences of word images similar to a given query.Word-spotting is applied to detect the label of the captions, a word like ‘Fig’, ‘FIG’, ‘Figure’ ...followed by the figure number. Combining spatial information and word recognition greatly improve the detection of caption lines. Our initial experiments process more than 300 pages from three different books.