Text line segmentation of historical documents: a survey

Authors:
Laurence Likforman-Sulem;Abderrazak Zahour;Bruno Taconet
Affiliations:
GET-Ecole Nationale Supérieure des Télécommunications/TSI and CNRS-LTCI, 46 rue Barrault, 75013, Paris, France;Université du Havre/GED, IUT, Place Robert Schuman, 76610, Le Havre, France;Université du Havre/GED, IUT, Place Robert Schuman, 76610, Le Havre, France
Venue:
International Journal on Document Analysis and Recognition
Year:
2007

Citing 0
Cited 33

Handwritten word-spotting using hidden Markov models and universal vocabularies

Pattern Recognition
A hybrid method for three segmentation level of handwritten Arabic script

Proceedings of the International Workshop on Multilingual OCR
A method for combining complementary techniques for document image segmentation

Pattern Recognition
Text line and word segmentation of handwritten documents

Pattern Recognition
A method for combining complementary techniques for document image segmentation

Pattern Recognition
Confidence Measures for Error Correction in Interactive Transcription Handwritten Text

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Adaptation from partially supervised handwritten text transcriptions

Proceedings of the 2009 international conference on Multimodal interfaces
Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths

Image and Vision Computing
Offline handwritten character recognition of Gujrati script using pattern matching

ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Balancing error and supervision effort in interactive-predictive handwriting recognition

Proceedings of the 15th international conference on Intelligent user interfaces
Text line detection and segmentation: uneven skew angles and hill-and-dale writing

Proceedings of the 2010 ACM Symposium on Applied Computing
Ground truth creation for handwriting recognition in historical documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Translating handwritten bushman texts

Proceedings of the 10th annual joint conference on Digital libraries
Interactive layout analysis and transcription systems for historic handwritten documents

Proceedings of the 10th ACM symposium on Document engineering
Medieval manuscript layout model

Proceedings of the 10th ACM symposium on Document engineering
A new scheme for unconstrained handwritten text-line segmentation

Pattern Recognition
Local descriptors for document layout analysis

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part III
A line-based representation for matching words in historical manuscripts

Pattern Recognition Letters
Automatic line and word segmentation applied to densely line-skewed historical handwritten document images

Integrated Computer-Aided Engineering
Advantages of the extended water flow algorithm for handwritten text segmentation

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Data mining medieval documents by word spotting

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Text line segmentation for gray scale historical document images

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Character segmentation from ancient palm leaf manuscripts in Thailand

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Statistical mixture model for documents skew angle estimation

Pattern Recognition Letters
Similarity-based training set acquisition for continuous handwriting recognition

Information Sciences: an International Journal
Analysis of document snippets as a basis for reconstruction

VAST'09 Proceedings of the 10th International conference on Virtual Reality, Archaeology and Cultural Heritage
Natural language inspired approach for handwritten text line detection in legacy documents

LaTeCH '12 Proceedings of the 6th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities
Line segmentation of handwritten Gurmukhi manuscripts

Proceeding of the workshop on Document Analysis and Recognition
Understanding Digital Documents Using Gestalt Properties of Isothetic Components

International Journal of Digital Library Systems
Robust text and drawing segmentation algorithm for historical documents

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Generation of learning samples for historical handwriting recognition using image degradation

Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Text line extraction for historical document images

Pattern Recognition Letters
A new thresholding algorithm for document images based on the perception of objects by distance

Integrated Computer-Aided Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines), automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade and dedicated to documents of historical interest.