Three Approaches to "Industrial" Table Spotting

  • Authors:
  • Affiliations:
  • Venue:
  • ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: This paper introduces three approaches for an industrial, comprehensive document analysis system to enable it to spot tables in documents. Searching for a set of known table headers (approach 1) works rather well in a significant number of documents. But this approach (though it is implemented tolerant to OCR errors) is not tolerant enough towards some kinds of even minor aberrations. This not only decreases the recognition results, but also, even worse, makes users feel uncomfortable. Pragmatically trying to mimic for what the human eyes might key, leads to our two further, complementing approaches: searching for layout structures which resemble parts of columns (approach 2), and searching for groupings of similar lines (approach 3). The suitability of the approaches for our system requires them to be very simple to implement and simply to explain to users, computationally cheap, and combinable. In the domain of health insurances who receive huge amounts of so called medical liquidations on a daily basis we obtain very good results. On document samples representative for the every day practice of five customers -health insurance companies- tables were spotted as good and as fast as the customers expected the system to be. We thus consider our current approaches as a step towards cognitive adequacy.