Mining tables from large scale HTML texts

  • Authors:
  • Hsin-Hsi Chen;Shih-Chung Tsai;Jin-He Tsai

  • Affiliations:
  • National Taiwan University, Taipei, Taiwan, R.O.C.;National Taiwan University, Taipei, Taiwan, R.O.C.;National Taiwan University, Taipei, TAIWAN, R.O.C.

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
  • Year:
  • 2000

Quantified Score

Hi-index 0.02

Visualization

Abstract

Table is a very common presentation scheme, but few papers touch on table extraction in text data mining. This paper focuses on mining tables from large-scale HTML texts. Table filtering, recognition, interpretation, and presentation are discussed. Heuristic rules and cell similarities are employed to identify tables. The F-measure of table recognition is 86.50%. We also propose an algorithm to capture attribute-value relationships among table cells. Finally, more structured data is extracted and presented.