Information extraction from unstructured web text

  • Authors:
  • Oren Etzioni;Ana-Maria Popescu

  • Affiliations:
  • University of Washington;University of Washington

  • Venue:
  • Information extraction from unstructured web text
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the past few years the World Wide Web has emerged as an important source of data, much of it in the form of unstructured text. This thesis describes an extensible model for information extraction that takes advantage of the unique characteristics of Web text and leverages existent search engine technology in order to ensure the quality of the extracted information. The key features of our approach are the use of lexico-syntactic patterns, Web-scale statistics and unsupervised or semi-supervised learning methods. Our information extraction model has been instantiated and extended in order to solve a set of diverse information extraction tasks: subclass and related class extraction, relation property learning, the acquisition of salient product features and corresponding user opinions from customer reviews and finally, the mining of commonsense information from the Web for the benefit of integrated AI systems.