Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extraction and Integration Information in HTML Tables
CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
Ontology Extraction from Tables on the Web
SAINT '06 Proceedings of the International Symposium on Applications on Internet
Extracting logical structures from HTML tables
Computer Standards & Interfaces
Hybrid approach to extracting information from web-tables
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
A machine learning based approach for separating head from body in web-tables
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
An XML approach to semantically extract data from HTML tables
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Table is an effective manifestation of structural knowledge, on which the semantic analysis is a very important part in semantic document analysis. To interpret the structure and the semantic relations of the HTML documents, definitions of normalized table and tabular coordinate system are proposed according to database relation theory. This paper classifies cells into normalized cells and visual cells, indicates that row or column and its combined cell are the primary semantic expression forms of table and nested tables are the further expansion of a certain table cell. Finally, a table analyzing algorithm is given based on tabular coordinate system. Practice shows that the algorithm is simple, fast and having certain practical significance.