Applying the T-Recs Table Recognition System to the Business Letter Domain

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 11

A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Table Detection via Probability Optimization

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Making Documents Work: Challenges for Document Understanding

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
TableSeer: automatic table metadata extraction and searching in digital libraries

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Identifying table boundaries in digital documents via sparse line detection

Proceedings of the 17th ACM conference on Information and knowledge management
White-Box Evaluation of Computer Vision Algorithms through Explicit Decision-Making

ICVS '09 Proceedings of the 7th International Conference on Computer Vision Systems: Computer Vision Systems
Table detection in heterogeneous documents

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
An open approach towards the benchmarking of table structure recognition systems

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Towards a common evaluation strategy for table structure recognition algorithms

Proceedings of the 10th ACM symposium on Document engineering
An efficient pre-processing method to identify logical components from PDF documents

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Automatic table detection in document images

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: This paper summarizes the core idea of the T-Recs table recognition system, an integrated system covering block segmentation, table location and a model free structural analysis of tables. T-Recs works on the output of commercial OCR systems that provide the word bounding box geometry together with the text itself (e.g. Xerox ScanWorX). While T-Recs performs well on a number of document categories, business letters still remained as a challenging domain because the T-Recs location heuristics is mislead by their header or footer, resulting in a low recognition precision. But business letters such as invoices are a very interesting domain for industrial applications due to their high amount of documents to be analyzed and the importance of the data carried within their tables. Hence, we developed a more restrictive approach which is implemented in the T-Recs++ prototype. This paper describes the ideas of the T-Recs++ location and also proposes a quality evaluation measure that reflects the bottom-up strategy of either T-Recs or T-Recs++. Finally, some results comparing both systems on a collection of business letters are given.