Three Approaches to "Industrial" Table Spotting

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 10

A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
smartFIX: A Requirements-Driven System for Document Analysis and Understanding

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Table Detection via Probability Optimization

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Graphics Recognition - from Re-engineering to Retrieval

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Making Documents Work: Challenges for Document Understanding

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Constraint solving over OCR graphs

INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
An open approach towards the benchmarking of table structure recognition systems

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Table detection from plain text using machine learning and document structure

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
On benchmarking of invoice analysis systems

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Table detection in document images using header and trailer patterns

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: This paper introduces three approaches for an industrial, comprehensive document analysis system to enable it to spot tables in documents. Searching for a set of known table headers (approach 1) works rather well in a significant number of documents. But this approach (though it is implemented tolerant to OCR errors) is not tolerant enough towards some kinds of even minor aberrations. This not only decreases the recognition results, but also, even worse, makes users feel uncomfortable. Pragmatically trying to mimic for what the human eyes might key, leads to our two further, complementing approaches: searching for layout structures which resemble parts of columns (approach 2), and searching for groupings of similar lines (approach 3). The suitability of the approaches for our system requires them to be very simple to implement and simply to explain to users, computationally cheap, and combinable. In the domain of health insurances who receive huge amounts of so called medical liquidations on a daily basis we obtain very good results. On document samples representative for the every day practice of five customers -health insurance companies- tables were spotted as good and as fast as the customers expected the system to be. We thus consider our current approaches as a step towards cognitive adequacy.