Scale Space Technique for Word Segmentation in Handwritten Documents
SCALE-SPACE '99 Proceedings of the Second International Conference on Scale-Space Theories in Computer Vision
A Hybrid Approach t Word Segmentation
ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
Gap metrics for word separation in handwritten lines
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Word Segmentation in Handwritten Korean Text Lines Based on Gap Clustering Techniques
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Line Detection and Segmentation in Historical Church Registers
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Tree Structure forWord Extraction from Handwritten Text Lines
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Independent Component Analysis Segmentation Algorithm
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Baseline Image Classification Approach Using Local Minima Selection
IVIC '09 Proceedings of the 1st International Visual Informatics Conference on Visual Informatics: Bridging Research and Practice
A line-based representation for matching words in historical manuscripts
Pattern Recognition Letters
Integrated Computer-Aided Engineering
Hi-index | 0.00 |
The recognition of script in historical documents requiressuitable techniques in order to identify single words.Segmentation of lines and words is a challenging task becauselines are not straight and words may intersect withinand between lines. For correct word segmentation, the conventionalanalysis of distances between text objects needsto be supplemented by a second component predicting possibleword boundaries based on semantical information.For date entries, hypotheses about potential boundaries aregenerated based on knowledge about the different variationsas to how dates are written in the documents. It ismodeled by distribution curves for potential boundary locations.Word boundaries are detected by classification oflocal features, such as distances between adjacent text objects,together with location-based boundary distributioncurves as a-priori knowledge. We applied the technique todate entries in historical church registers. Documents fromthe 18th and 19th century were used for training and testing.The data set consisted of 674 word boundaries in 298date entries. Our algorithm found the correct separationunder the best four hypotheses for a word sequence in 97%of all cases in the test data set.