A keyword spotting approach using blurred shape model-based descriptors
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
We present a method for figure caption detection by employing a fusion of several information sources. The evaluation is performed on documents gathered from the collection of the historical medical digital library Medic@. A method based on perceptual grouping simultaneously segments the vertical and horizontal text lines in a page. Spatial relationships between the text lines and the graphics are considered to select a set of caption line candidates. A feature-based wordspotting method is proposed to retrieve the occurrences of word images similar to a given query.Word-spotting is applied to detect the label of the captions, a word like ‘Fig’, ‘FIG’, ‘Figure’ ...followed by the figure number. Combining spatial information and word recognition greatly improve the detection of caption lines. Our initial experiments process more than 300 pages from three different books.