Model-Guided Segmentation and Layout Labelling of Document Images Using a Hierarchical Conditional Random Field

Authors:
Santanu Chaudhury;Megha Jindal;Sumantra Dutta Roy
Affiliations:
Dept of Electrical Engg, IIT Delhi, Haux Khas, New Delhi, India 110 016;Dept of Electrical Engg, IIT Delhi, Haux Khas, New Delhi, India 110 016;Dept of Electrical Engg, IIT Delhi, Haux Khas, New Delhi, India 110 016
Venue:
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Year:
2009

Citing 8
Cited 0

A Statistically Based, Highly Accurate Text-Line Segmentation Method

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Learning Non-Generative Grammatical Models for Document Analysis

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Document Layout Analysis and Classification and Its Application in OCR

EDOCW '06 Proceedings of the 10th IEEE on International Enterprise Distributed Object Computing Conference Workshops
Decompose Document Image Using Integer Linear Programming

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Document Image Segmentation Using a 2D Conditional Random Field Model

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Multiscale conditional random fields for image labeling

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Stochastic language models for style-directed layout analysis of document images

IEEE Transactions on Image Processing
Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a model-guided segmentation and document layout extraction scheme based on hierarchical Conditional Random Fields (CRFs, hereafter). Common methods to classify a pixel of a document image into classes - text, background and image - are often noisy, and error-prone, often requiring post-processing through heuristic methods. The input to the system is a pixel-wise classification based on the output of a Fisher classifier based on the output of a set of Globally Matched Wavelet (GMW) Filters. The system extracts features which encode contextual information and spatial configurations of a given document image, and learns relations between these layout entities using hierarchical CRFs. The hierarchical CRF enables learning at various levels - 1. local features for text, background and image areas; 2. contextual features for further classifying region blocks - title, author block, heading, paragraph, etc.; and 3. probabilistic layout model for encoding global relations between the above blocks for a particular class of documents. Although the work has been motivated for an automated layout analyser and machine translator for technical papers, it can also be used for other applications such as search, indexing and information retrieval.