Automatic name extraction from degraded document images

Authors:
Laurence Likforman-Sulem;Pascal Vaillant;Aliette de Bodard de la Jacopière
Affiliations:
Ecole Nationale Supérieure des Télécommunications/TSI and CNRS-LTCI, 46 rue Barrault, 75013, Paris, France;Ecole Nationale Supérieure des Télécommunications/TSI and CNRS-LTCI, Paris, France and Université des Antilles-Guyane, Institut d’Enseignement Supérieur d ...;Ecole Nationale Supérieure des Télécommunications/TSI and CNRS-LTCI, 46 rue Barrault, 75013, Paris, France
Venue:
Pattern Analysis & Applications
Year:
2006

Citing 0
Cited 1

Using a boosted tree classifier for text segmentation in hand-annotated documents

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem addressed in this paper is the automatic extraction of names from a document image. Our approach relies on the combination of two complementary analyses. First, the image-based analysis exploits visual clues to select the regions of interest in the document. Second, the textual-based analysis searches for name patterns and low-level word textual features. Both analyses are then combined at the word level through a neural network fusion scheme. Reported results on degraded documents such as facsimile and photocopied technical journals demonstrate the interest of the combined approach.