Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A Scalable Hybrid Approach for Extracting Head Components from Web Tables
IEEE Transactions on Knowledge and Data Engineering
Analysis and Interpretation of Semantic HTML Tables
WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Adapting data table to improve web accessibility
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Hi-index | 0.00 |
This study aims to separate the head from the data in web-tables to extract useful information. To achieve this aim, web-tables must be converted into a machine readable form, an attribute-value pair, the relation of which is similar to that of head-body. We have separated meaningful tables and decorative tables in our previous work, because web-tables are used for the purpose of knowledge structuring as well as document design, and only meaningful tables can be used to extract information. In order to extract the semantic relations existing between language contents in a meaningful table, this study separated the head from the body in meaningful tables using machine learning. We (a) established features observing the editing habit of authors and tables themselves, and (b) established a model using machine learning algorithm, C4.5 in order to separate the head from the body. We obtained 86.2% accuracy in extracting the head from the meaningful tables.