Notes on contemporary table recognition

  • Authors:
  • David W. Embley;Daniel Lopresti;George Nagy

  • Affiliations:
  • Computer Science Department, Brigham Young University, Provo, UT;Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA;Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY

  • Venue:
  • DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The shift of interest to web tables in HTML and PDF files, coupled with the incorporation of table analysis and conversion routines in commercial desktop document processing software, are likely to turn table recognition into more of a systems than an algorithmic issue. We illustrate the transition by some actual examples of web table conversion. We then suggest that the appropriate target format for table analysis, whether performed by conventional customized programs or by off-the-shelf software, is a representation based on the abstract table introduced by X. Wang in 1996. We show that the Wang model is adequate for some useful tasks that prove elusive for less explicit representations, and outline our plans to develop a semi-automated table processing system to demonstrate this approach. Screen-snaphots of a prototype tool to allow table mark-up in the style of Wang are also presented.