Table Detection via Probability Optimization

Authors:
Yalin Wang;Ihsin T. Phillips;Robert M. Haralick
Affiliations:
-;-;-
Venue:
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Year:
2002

Citing 7
Cited 3

Computer and Robot Vision

Computer and Robot Vision
A retargetable table reader

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Model-based analysis of printed tables

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Automatic Table Ground Truth Generation and a Background-Analysis-Based Table Structure Extraction Method

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Applying the T-Recs Table Recognition System to the Business Letter Domain

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Three Approaches to "Industrial" Table Spotting

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Document structure analysis and performance evaluation

Document structure analysis and performance evaluation

Using the structure of Web sites for automatic segmentation of tables

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Improving literature preselection by searching for images

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
Table detection in document images using header and trailer patterns

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we define the table detection problem as a probability optimization problem. We begin, as we do in our previous algorithm, finding and validating each detected table candidates. We proceed to compute a set of probability measurements for each of the table entities. The computation of the probability measurements takes into consideration tables, table text separators and table neighboring text blocks. Then, an iterative updating method is used to optimize the page segmentation probability to obtain the final result. This new algorithm shows a great improvement over our previous algorithm. The training and testing data set for the algorithm include 1, 125 document pages having 518 table entities and a total of 10, 934 cell entities. Compared with our previous work, it raised the accuracy rate to 95.67% from 90.32% and to 97.05% from 92.04%.