Improving open information extraction for informal web documents with ripple-down rules

  • Authors:
  • Myung Hee Kim;Paul Compton

  • Affiliations:
  • The University of New South Wales, Sydney, NSW, Australia;The University of New South Wales, Sydney, NSW, Australia

  • Venue:
  • PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) aimed at developing relation-independent information extraction. Open Information Extraction systems seek to extract all potential relations from the text rather than extracting a few pre-defined relations. Existing Open Information Extraction systems have mainly focused on Web's heterogeneity rather than the Web's informality. The performance of the REVERB system, a state-of-the-art OIE system, drops dramatically as informality increases in Web documents. This paper proposes a Hybrid Ripple-Down Rules based Open Information Extraction (Hybrid RDROIE) system, which uses RDR on top of a conventional OIE system. The Hybrid RDROIE system applies RDR's incremental learning technique as an add-on to the state-of-the-art REVERB OIE system to correct the performance degradation of REVERB due to the Web's informality in a domain of interest. With this wrapper approach, the baseline performance is that of the REVERB system with RDR correcting errors in a domain of interest. The Hybrid RDROIE system doubled REVERB's performance in a domain of interest after two hours training.