Wrapper generation for semi-structured Internet sources
ACM SIGMOD Record
Machine Learning
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Effective Retrieval of Information in Tables on the Internet
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Extraction of meaningful tables from the internet using decision trees
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Hi-index | 0.00 |
Making HTML documents, the authors use various methods for clearly conveying their intension. In those various methods, this paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the knowledge structuring as well as design of documents. Thus, we are firstly interested in classifying tables into two types: meaningful tables and decorative tables. However, this is not easy because HTML does not separate presentation and structure. This paper proposes a method of extracting meaningful tables using a modified k-means and compares it with other methods. The experiment results show that classifying on web documents is promising.