A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Table Detection via Probability Optimization
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Making Documents Work: Challenges for Document Understanding
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
TableSeer: automatic table metadata extraction and searching in digital libraries
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Identifying table boundaries in digital documents via sparse line detection
Proceedings of the 17th ACM conference on Information and knowledge management
White-Box Evaluation of Computer Vision Algorithms through Explicit Decision-Making
ICVS '09 Proceedings of the 7th International Conference on Computer Vision Systems: Computer Vision Systems
Table detection in heterogeneous documents
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
An open approach towards the benchmarking of table structure recognition systems
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Towards a common evaluation strategy for table structure recognition algorithms
Proceedings of the 10th ACM symposium on Document engineering
An efficient pre-processing method to identify logical components from PDF documents
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Automatic table detection in document images
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Hi-index | 0.00 |
Abstract: This paper summarizes the core idea of the T-Recs table recognition system, an integrated system covering block segmentation, table location and a model free structural analysis of tables. T-Recs works on the output of commercial OCR systems that provide the word bounding box geometry together with the text itself (e.g. Xerox ScanWorX). While T-Recs performs well on a number of document categories, business letters still remained as a challenging domain because the T-Recs location heuristics is mislead by their header or footer, resulting in a low recognition precision. But business letters such as invoices are a very interesting domain for industrial applications due to their high amount of documents to be analyzed and the importance of the data carried within their tables. Hence, we developed a more restrictive approach which is implemented in the T-Recs++ prototype. This paper describes the ideas of the T-Recs++ location and also proposes a quality evaluation measure that reflects the bottom-up strategy of either T-Recs or T-Recs++. Finally, some results comparing both systems on a collection of business letters are given.