Text line extraction for historical document images

Authors:
Raid Saabni;Abedelkadir Asi;Jihad El-Sana
Affiliations:
-;-;-
Venue:
Pattern Recognition Letters
Year:
2014

Citing 31
Cited 0

Handwritten document image segmentation and analysis

Pattern Recognition Letters
New geodesic distance transforms for gray-scale images

Pattern Recognition Letters
A Fast Algorithm for Bottom-Up Document Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Segmentation of page images using the area Voronoi diagram

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
The Document Spectrum for Page Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Persian cursive script recognition

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
A Hough based algorithm for extracting text lines in handwritten documents

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
User-Assisted Archive Document Image Analysis for Digital Library Construction

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Text Alignment with Handwritten Documents

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Line Separation for Complex Document Images Using Fuzzy Runlength

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
A search engine for historical manuscript images

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Text Line Segmentation in Handwritten Document Using a Production System

IWFHR '04 Proceedings of the Ninth International Workshop on Frontiers in Handwriting Recognition
A Scale Space Approach for Automatically Segmenting Words from Historical Handwritten Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Extraction from Gray Scale Historical Document Images Using Adaptive Local Connectivity Map

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Text line segmentation of historical documents: a survey

International Journal on Document Analysis and Recognition
Seam carving for content-aware image resizing

ACM SIGGRAPH 2007 papers
Text Line Segmentation of Historical Arabic Documents

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
On-Line Handwritten Text Line Detection Using Dynamic Programming

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Handwriting Segmentation Contest

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Script-Independent Text Line Segmentation in Freestyle Handwritten Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text line detection in handwritten documents

Pattern Recognition
Script-Independent Handwritten Textlines Segmentation Using Active Contours

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Handwritten Text Line Segmentation by Shredding Text into its Lines

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Line Segmentation for Degraded Handwritten Historical Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A Steerable Directional Local Profile Technique for Extraction of Handwritten Arabic Text Lines

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Ground truth creation for handwriting recognition in historical documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
A new scheme for unconstrained handwritten text-line segmentation

Pattern Recognition
ICDAR2009 handwriting segmentation contest

International Journal on Document Analysis and Recognition - Special Issue on Performance Evaluation
User-assisted alignment of Arabic historical manuscripts

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Text-Line Extraction Using a Convolution of Isotropic Gaussian Filter with a Set of Line Filters

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Segmentation of Handwritten Textlines in Presence of Touching Components

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this paper we present a language independent global method for automatic text line extraction. The proposed approach computes an energy map of a text image and determines the seams that pass across and between text lines. In this work we have developed two algorithms along this novel idea, one for binary images and the other for grayscale images. The first algorithm works on binary document images and assumes it is possible to extract the components along text lines. The seam passes on the middle and along the text line, l, and marks the components that make the letters and words of l. It then assigns the unmarked component to the closest text line. The second algorithm works directly on grayscale document images. It computes the distance transform directly from the grayscale images and generates two types of seams: medial seams and separating seams. The medial seams determine the text lines and the separating seams define the upper and lower boundaries of these text lines. Moreover, we present a new benchmark dataset of historical document images with various types of challenges. The dataset contains a groundtruth for text line extraction and it contains samples with different languages such as: Arabic, English and Spanish. A binary dataset is used to test the binary algorithm. We performed various experimental results using our two algorithms on the mentioned datasets and report segmentation accuracy. We also compare our algorithms with the state-of-the-art text line segmentation methods.