Extraction of meaningful tables from the internet using decision trees

  • Authors:
  • Sung-Won Jung;Won-Hee Lee;Sang-Kyu Park;Hyuk-Chul Kwon

  • Affiliations:
  • Al Lab., Dept. of Computer Science, Pusan National University, Jang-geon Dong, Busan, Korea;Al Lab., Dept. of Computer Science, Pusan National University, Jang-geon Dong, Busan, Korea;Electronic and Telecommunications Research Institute, Yuseong Gu, Daejeon, Korea;Al Lab., Dept. of Computer Science, Pusan National University, Jang-geon Dong, Busan, Korea

  • Venue:
  • IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The information retrieval system currently in use fails to consider the structural information of documents but uses extracted indexes from documents instead. Structural information such as the font face, font size, indentation, tables, and etc. demonstrate the author's meaning and is clearly the prime means of documentation. This paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the structure of knowledge and also the design of documents. This report will propose a method of extracting meaningful tables using a decision tree and to construct a dictionary of table indexes in order to apply an information retrieval system and thus enhance the accuracy.