Analysis and Interpretation of Semantic HTML Tables

  • Authors:
  • Wensheng Yin;Feifei Guo;Fan Xu;Xiuguo Chen

  • Affiliations:
  • School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074;School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074;School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074;School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074

  • Venue:
  • WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Table is an effective manifestation of structural knowledge, on which the semantic analysis is a very important part in semantic document analysis. To interpret the structure and the semantic relations of the HTML documents, definitions of normalized table and tabular coordinate system are proposed according to database relation theory. This paper classifies cells into normalized cells and visual cells, indicates that row or column and its combined cell are the primary semantic expression forms of table and nested tables are the further expansion of a certain table cell. Finally, a table analyzing algorithm is given based on tabular coordinate system. Practice shows that the algorithm is simple, fast and having certain practical significance.