Word Segmentation of Handwritten Dates in Historical Documents by Combining Semantic A-Priori-Knowledge with Local Features

Authors:
Markus Feldbach;Klaus D. Tönnies
Affiliations:
-;-
Venue:
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Year:
2003

Citing 6
Cited 5

Scale Space Technique for Word Segmentation in Handwritten Documents

SCALE-SPACE '99 Proceedings of the Second International Conference on Scale-Space Theories in Computer Vision
A Hybrid Approach t Word Segmentation

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Gap metrics for word separation in handwritten lines

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Word Segmentation in Handwritten Korean Text Lines Based on Gap Clustering Techniques

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Text Line Segmentation and Word Recognition in a System for General Writer Independent Handwriting Recognition

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Line Detection and Segmentation in Historical Church Registers

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

Tree Structure forWord Extraction from Handwritten Text Lines

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Independent Component Analysis Segmentation Algorithm

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Baseline Image Classification Approach Using Local Minima Selection

IVIC '09 Proceedings of the 1st International Visual Informatics Conference on Visual Informatics: Bridging Research and Practice
A line-based representation for matching words in historical manuscripts

Pattern Recognition Letters
Automatic line and word segmentation applied to densely line-skewed historical handwritten document images

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recognition of script in historical documents requiressuitable techniques in order to identify single words.Segmentation of lines and words is a challenging task becauselines are not straight and words may intersect withinand between lines. For correct word segmentation, the conventionalanalysis of distances between text objects needsto be supplemented by a second component predicting possibleword boundaries based on semantical information.For date entries, hypotheses about potential boundaries aregenerated based on knowledge about the different variationsas to how dates are written in the documents. It ismodeled by distribution curves for potential boundary locations.Word boundaries are detected by classification oflocal features, such as distances between adjacent text objects,together with location-based boundary distributioncurves as a-priori knowledge. We applied the technique todate entries in historical church registers. Documents fromthe 18th and 19th century were used for training and testing.The data set consisted of 674 word boundaries in 298date entries. Our algorithm found the correct separationunder the best four hypotheses for a word sequence in 97%of all cases in the test data set.