Learning Knowledge Bases for Information Extraction from Multiple Text Based Web Sites

Authors:
Xiaoying Gao;Mengjie Zhang
Affiliations:
-;-
Venue:
IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
Year:
2003

Citing 0
Cited 1

Mining chat conversations for sex identification

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a learning approach to automatically building knowledgebases for information extraction from multiple text based webpages. A frame based representation is introduced to representdomain knowledge as knowledge unit frames. A frame learningalgorithm is developed to automatically learn knowledge unit framesfrom training examples. Some training examples can be obtained byautomatically parsing a number of tabular web pages in the samedomain, which greatly reduced the time consuming manual work. Thisapproach was investigated on ten web sites of real estateadvertisements and car advertisements and nearly all theinformation was successfully extracted with very few false alarms.These results suggest that both the knowledge unit framerepresentation and the frame learning algorithm work well, domainspecific knowledge base can be learned from training examples, andthe domain specific knowledge base can be used for informationextraction from flexible text-based semi-structured Webpages onmultiple Web sites.