Extraction of meaningful tables from the internet using decision trees

Authors:
Sung-Won Jung;Won-Hee Lee;Sang-Kyu Park;Hyuk-Chul Kwon
Affiliations:
Al Lab., Dept. of Computer Science, Pusan National University, Jang-geon Dong, Busan, Korea;Al Lab., Dept. of Computer Science, Pusan National University, Jang-geon Dong, Busan, Korea;Electronic and Telecommunications Research Institute, Yuseong Gu, Daejeon, Korea;Al Lab., Dept. of Computer Science, Pusan National University, Jang-geon Dong, Busan, Korea
Venue:
IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Year:
2003

Citing 7
Cited 1

Aspects of the P-Norm model of information retrieval: syntactic query generation, efficiency, and theoretical properties

Aspects of the P-Norm model of information retrieval: syntactic query generation, efficiency, and theoretical properties
Wrapper generation for semi-structured Internet sources

ACM SIGMOD Record
Information retrieval on the web

ACM Computing Surveys (CSUR)
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Effective Retrieval of Information in Tables on the Internet

IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types

Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types

Mining table information on the internet

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The information retrieval system currently in use fails to consider the structural information of documents but uses extracted indexes from documents instead. Structural information such as the font face, font size, indentation, tables, and etc. demonstrate the author's meaning and is clearly the prime means of documentation. This paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the structure of knowledge and also the design of documents. This report will propose a method of extracting meaningful tables using a decision tree and to construct a dictionary of table indexes in order to apply an information retrieval system and thus enhance the accuracy.