Detecting and recognizing tables in spreadsheets

  • Authors:
  • Iyad Abu Doush;Enrico Pontelli

  • Affiliations:
  • Yarmouk University, Irbid, Jordan;New Mexico State University, Las Cruces, NM

  • Venue:
  • DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detecting tables in a spreadsheet is the first step needed to make spreadsheet documents accessible to individuals with visual disabilities. Techniques to enable aural presentation and navigation of tables have been proposed, but they assume a thorough knowledge of the structure of the table; on the other hand, boundaries and structure of tables in a spreadsheet are not evident without a visual exploration. This paper presents an algorithm for table recognition in spreadsheets. The algorithm uses three types of cells as its basis: title cell, header cell, and data cell. Different attributes of the cells are used to identify the cell type within a spreadsheet. Hierarchical clustering is used to aggregate cells to compose the functional components of a table. The algorithm has been evaluated on a diverse set of benchmarks with very encouraging results.