Analysis and Interpretation of Semantic HTML Tables

Authors:
Wensheng Yin;Feifei Guo;Fan Xu;Xiuguo Chen
Affiliations:
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074;School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074;School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074;School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China 430074
Venue:
WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Year:
2009

Citing 7
Cited 0

Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extraction and Integration Information in HTML Tables

CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
Ontology Extraction from Tables on the Web

SAINT '06 Proceedings of the International Symposium on Applications on Internet
Extracting logical structures from HTML tables

Computer Standards & Interfaces
Hybrid approach to extracting information from web-tables

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
A machine learning based approach for separating head from body in web-tables

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
An XML approach to semantically extract data from HTML tables

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Table is an effective manifestation of structural knowledge, on which the semantic analysis is a very important part in semantic document analysis. To interpret the structure and the semantic relations of the HTML documents, definitions of normalized table and tabular coordinate system are proposed according to database relation theory. This paper classifies cells into normalized cells and visual cells, indicates that row or column and its combined cell are the primary semantic expression forms of table and nested tables are the further expansion of a certain table cell. Finally, a table analyzing algorithm is given based on tabular coordinate system. Practice shows that the algorithm is simple, fast and having certain practical significance.