Overlapped text segmentation using Markov random field and aggregation

Authors:
Xujun Peng;Srirangaraj Setlur;Venu Govindaraju;Ramachandrula Sitaram
Affiliations:
University at Buffalo, SUNY, Amherst, NY;University at Buffalo, SUNY, Amherst, NY;University at Buffalo, SUNY, Amherst, NY;HP Labs India, Bangalore, India
Venue:
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Year:
2010

Citing 11
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Learning Low-Level Vision

International Journal of Computer Vision - Special issue on statistical and computational theories of vision: modeling, learning, sampling and computing, Part I
Example-Based Super-Resolution

IEEE Computer Graphics and Applications
The Document Spectrum for Page Layout Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Shape Matching and Object Recognition Using Shape Contexts

IEEE Transactions on Pattern Analysis and Machine Intelligence
Separating Handwritten Material from Machine Printed Text Using Hidden Markov Models

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Machine Printed Text and Handwriting Identification in Noisy Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Iterative Figure-Ground Discrimination

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Multilevel Belief Propagation for Fast Inference on Markov Random Fields

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Combined Top-Down/Bottom-Up Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Markov Random Field Based Text Identification from Annotated Machine Printed Documents

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition

Using a boosted tree classifier for text segmentation in hand-annotated documents

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Separating machine printed text and handwriting from overlapping text is a challenging problem in the document analysis field and no reliable algorithms have been developed thus far. In this paper, we propose a novel approach for separating handwriting from binary image of overlapped text. Instead of using fixed size training patches, we describe an aggregation method which uses shape context features to extract training samples automatically. We use a Markov Random Field (MRF) to model the overlapped text. The neighbor system is inherited from a coarsening procedure and the prior and likelihood of the MRF is learned based on a distance metric. Experimental results show that the proposed method can achieve 87.97% recall for handwriting and 91.44% recall for machine printed text.