Learning Knowledge Bases for Information Extraction from Multiple Text Based Web Sites

  • Authors:
  • Xiaoying Gao;Mengjie Zhang

  • Affiliations:
  • -;-

  • Venue:
  • IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a learning approach to automatically building knowledgebases for information extraction from multiple text based webpages. A frame based representation is introduced to representdomain knowledge as knowledge unit frames. A frame learningalgorithm is developed to automatically learn knowledge unit framesfrom training examples. Some training examples can be obtained byautomatically parsing a number of tabular web pages in the samedomain, which greatly reduced the time consuming manual work. Thisapproach was investigated on ten web sites of real estateadvertisements and car advertisements and nearly all theinformation was successfully extracted with very few false alarms.These results suggest that both the knowledge unit framerepresentation and the frame learning algorithm work well, domainspecific knowledge base can be learned from training examples, andthe domain specific knowledge base can be used for informationextraction from flexible text-based semi-structured Webpages onmultiple Web sites.