Interactive conversion of web tables

  • Authors:
  • Raghav Krishna Padmanabhan;Ramana Chakradhar Jandhyala;Mukkai Krishnamoorthy;George Nagy;Sharad Seth;William Silversmith

  • Affiliations:
  • ECSE, DocLab, Rensselaer Polytechnic Institute, Troy, NY;ECSE, DocLab, Rensselaer Polytechnic Institute, Troy, NY;ECSE, DocLab, Rensselaer Polytechnic Institute, Troy, NY;ECSE, DocLab, Rensselaer Polytechnic Institute, Troy, NY;CSE, University of Nebraska-Lincoln, Lincoln, NE;ECSE, DocLab, Rensselaer Polytechnic Institute, Troy, NY

  • Venue:
  • GREC'09 Proceedings of the 8th international conference on Graphics recognition: achievements, challenges, and evolution
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Two hundred web tables from ten sites were imported into Excel. The tables were edited as needed, then converted into layout independent Wang Notation using the Table Abstraction Tool (TAT). The output generated by TAT consists of XML files to be used for constructing narrow-domain ontologies. On an average each table required 104 seconds for editing. Augmentations like aggregates, footnotes, table titles, captions, units and notes were also extracted in an average time of 93 seconds. Every user intervention was logged and audited. The logged interactions were analyzed to determine the relative influence of factors like table size, number of categories and various types of augmentations on the processing time. The analysis suggests which aspects of interactive table processing can be automated in the near term, and how much time such automation would save. The correlation coefficient between predicted and actual processing time was 0.66.