Towards generic framework for tabular data extraction and management in documents

  • Authors:
  • Roya Rastan

  • Affiliations:
  • University of New South Wales, Sydney, Australia

  • Venue:
  • Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tables are one of the common data presentation structures in documents. However, the task of automatic recognition and extraction of tables embedded in documents is still a significant challenge, and data contained within tables still remains under-utilised. Although some common steps can be defined for table extraction, there is no generic approach for table extraction tasks which can be applied to different sources and provide an end-to-end repeatable work-flow. This paper looks at the table extraction problem from the process point of view and proposes a table extraction workflow, which can be considered as a plug-and-play architecture for table extraction. Also, we present an overview of our complete system where the extracted tables are stored and managed. Table extraction is considered in the context of financial statements in this work, but the methods apply generally.