Strategies for lifelong knowledge extraction from the web

  • Authors:
  • Michele Banko;Oren Etzioni

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • Proceedings of the 4th international conference on Knowledge capture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increasing availability of electronic text has made it possible to acquire information using a variety of techniques that leverage the expertise of both humans and machines. In particular, the field of Information Extraction (IE), in which knowledge is extracted automatically from text, has shown promise for large-scale knowledge acquisition. While IE systems can uncover assertions about individual entities with an increasing level of sophistication,alltext understanding -- the formation of a coherent theory from a textual corpus -- involves representation and learning abilities not currently achievable by today's IE systems. Compared to individual relational assertions outputted by IE systems, a theory includes coherent knowledge of abstract concepts and the relationships among them. We believe that the ability to fully discover the richness of knowledge present within large, unstructured and heterogeneous corpora will require a lifelong learning process in which earlier learned knowledge is used to guide subsequent learning. This paper introduces Alice, a lifelong learning agent whose goal is to automatically discovera collection of concepts, facts and generalizations that describe a particular topic of interest directly from a large volume of Web text. Building upon recent advances in unsupervised information extraction, we demonstrate that Alice can iteratively discover new concepts and compose general domain knowledge with a precision of 78%.