ClausIE: clause-based open information extraction

Authors:
Luciano Del Corro;Rainer Gemulla
Affiliations:
Max-Planck-Institute für Informatik, Saarbrücken, Germany;Max-Planck-Institute für Informatik, Saarbrücken, Germany
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 13
Cited 4

Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Semantic role labeling for open information extraction

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Recovering semantics of tables on the web

Proceedings of the VLDB Endowment
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Open information extraction: the second generation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Dependency-based open information extraction

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Open language learning for information extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
No noun phrase left behind: detecting and typing unlinkable entities

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
PATTY: a taxonomy of relational patterns with semantic types

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
KrakeN: N-ary facts in open information extraction

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction

Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
INDREX: in-database distributional relation extraction

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
WebChild: harvesting and organizing commonsense knowledge from the web

Proceedings of the 7th ACM international conference on Web search and data mining
Effective named entity recognition for idiosyncratic web collections

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.