Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis

Authors:
Abhishek Khandelwal;Pritha Choudhury;Ram Sarkar;Subhadip Basu;Mita Nasipuri;Nibaran Das
Affiliations:
CSE Department, Sikkim Manipal Institute of Technology, Sikkim, India;CSE Department, Sikkim Manipal Institute of Technology, Sikkim, India;CSE Department, Jadavpur University, Kolkata, India;CSE Department, Jadavpur University, Kolkata, India;CSE Department, Jadavpur University, Kolkata, India;CSE Department, Jadavpur University, Kolkata, India
Venue:
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Year:
2009

Citing 5
Cited 0

A Hough based algorithm for extracting text lines in handwritten documents

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Line Separation for Complex Document Images Using Fuzzy Runlength

DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Text line extraction from multi-skewed handwritten documents

Pattern Recognition
Handwriting Segmentation Contest

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
Script-Independent Text Line Segmentation in Freestyle Handwritten Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text line extraction is the first and one of the most critical steps in optical character recognition (OCR) of unconstrained handwritten documents. The present work reports a new methodology based on comparison of neighborhood connected components to determine whether they belong to the same text line. Components which are very small or very large compared to the average component height are ignored in the preprocessing step. During post-processing, such components are reconsidered and allocated to the lines to which they most suitably belong. The performance of the developed technique is evaluated on the benchmark training dataset for the ICDAR 2009 handwriting segmentation contest. The dataset consists of English, French, German and Greek handwritten texts. The overall text line identification accuracy on the mentioned dataset is observed to be around 93.35%.