A Unified Algorithm for Identification of Various Tabular Structures from Document Images

Authors:
Sekhar Mandal;Amit K. Das;Partha Bhowmick;Bhabatosh Chanda
Affiliations:
Bengal Engineering and Science University, Shibpur, India;Bengal Engineering and Science University, Shibpur, India;Indian Institute of Technology Kharagpur, India;Indian Statistical Institute, Kolkata, India
Venue:
International Journal of Digital Library Systems
Year:
2011

Citing 23
Cited 0

Layout Recognition of Multi-Kinds of Table-Form Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Design of a mathematical expression understanding system

Pattern Recognition Letters
Ambiguity and constraint in mathematical expression recognition

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Logical Structure Analysis of Book Document Images Using Contents Information

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Form Analysis by Neural Classification of Cells

DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Table Image Segmentation

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Understanding mathematical expressions from document images

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Design of a mathematical expression recognition system

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
A Hierarchical and Recursive Model of Mathematical Expressions for Automatic Reading of Mathematical Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Identifying Contents page of Documents

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
An automated generation of an electronic library based on document image understanding

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Mathematics recognition using graph rewriting

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Region Segmentation for Table Image with Unknown Complex Structure

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Document Understanding Using Probabilistic Relaxation: Application on Tables of Contents of Periodicals

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Part-of-Speech Tagging for Table of Contents Recognition

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Detection, Extraction and Representation of Tables

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Automated Detection and Segmentation of Table of Contents Page from Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Text-mining based journal splitting

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Automated Segmentation of Math-Zones from Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Automated Detection and Segmentation of Table of Contents Page and Index Pages from Document Images

ICIAP '03 Proceedings of the 12th International Conference on Image Analysis and Processing
A survey of table recognition: Models, observations, transformations, and inferences

International Journal on Document Analysis and Recognition
Distinguishing Mathematics Notation from English Text using Computational Geometry

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Detection and Segmentation of Table of Contents and Index Pages from Document Images

DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a unified algorithm for segmentation and identification of various tabular structures from document page images. Such tabular structures include conventional tables and displayed math-zones, as well as Table of Contents TOC and Index pages. After analyzing the page composition, the algorithm initially classifies the input set of document pages into tabular and non-tabular pages. A tabular page contains at least one of the tabular structures, whereas a non-tabular page does not contain any. The approach is unified in the sense that it is able to identify all tabular structures from a tabular page, which leads to a considerable simplification of document image segmentation in a novel manner. Such unification also results in speeding up the segmentation process, because the existing methodologies produce time-consuming solutions for treating different tabular structures as separate physical entities. Distinguishing features of different kinds of tabular structures have been used in stages in order to ensure the simplicity and efficiency of the algorithm and demonstrated by exhaustive experimental results.