Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis

  • Authors:
  • Abhishek Khandelwal;Pritha Choudhury;Ram Sarkar;Subhadip Basu;Mita Nasipuri;Nibaran Das

  • Affiliations:
  • CSE Department, Sikkim Manipal Institute of Technology, Sikkim, India;CSE Department, Sikkim Manipal Institute of Technology, Sikkim, India;CSE Department, Jadavpur University, Kolkata, India;CSE Department, Jadavpur University, Kolkata, India;CSE Department, Jadavpur University, Kolkata, India;CSE Department, Jadavpur University, Kolkata, India

  • Venue:
  • PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text line extraction is the first and one of the most critical steps in optical character recognition (OCR) of unconstrained handwritten documents. The present work reports a new methodology based on comparison of neighborhood connected components to determine whether they belong to the same text line. Components which are very small or very large compared to the average component height are ignored in the preprocessing step. During post-processing, such components are reconsidered and allocated to the lines to which they most suitably belong. The performance of the developed technique is evaluated on the benchmark training dataset for the ICDAR 2009 handwriting segmentation contest. The dataset consists of English, French, German and Greek handwritten texts. The overall text line identification accuracy on the mentioned dataset is observed to be around 93.35%.