Improving open information extraction for informal web documents with ripple-down rules

Authors:
Myung Hee Kim;Paul Compton
Affiliations:
The University of New South Wales, Sydney, NSW, Australia;The University of New South Wales, Sydney, NSW, Australia
Venue:
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Year:
2012

Citing 11
Cited 1

Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
On-demand information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
An Incremental Knowledge Acquisition Method for Improving Duplicate Invoices Detection

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Efficient Knowledge Acquisition for Extracting Temporal Relations

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
RDRCE: combining machine learning and knowledge acquisition

PKAW'10 Proceedings of the 11th international conference on Knowledge management and acquisition for smart systems and services
Experience with long-term knowledge acquisition

Proceedings of the sixth international conference on Knowledge capture
RDR-based open IE for the web document

Proceedings of the sixth international conference on Knowledge capture
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Situated cognition and knowledge acquisition research

International Journal of Human-Computer Studies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web contains a massive amount of information in unstructured natural language and obtaining valuable information from informally written Web documents is a major research challenge. One research focus is Open Information Extraction (OIE) aimed at developing relation-independent information extraction. Open Information Extraction systems seek to extract all potential relations from the text rather than extracting a few pre-defined relations. Existing Open Information Extraction systems have mainly focused on Web's heterogeneity rather than the Web's informality. The performance of the REVERB system, a state-of-the-art OIE system, drops dramatically as informality increases in Web documents. This paper proposes a Hybrid Ripple-Down Rules based Open Information Extraction (Hybrid RDROIE) system, which uses RDR on top of a conventional OIE system. The Hybrid RDROIE system applies RDR's incremental learning technique as an add-on to the state-of-the-art REVERB OIE system to correct the performance degradation of REVERB due to the Web's informality in a domain of interest. With this wrapper approach, the baseline performance is that of the REVERB system with RDR correcting errors in a domain of interest. The Hybrid RDROIE system doubled REVERB's performance in a domain of interest after two hours training.