Open language learning for information extraction

Authors:
Mausam;Michael Schmitz;Robert Bart;Stephen Soderland;Oren Etzioni
Affiliations:
University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 24
Cited 8

Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Combining linguistic and statistical analysis to extract relations from web documents

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting relations with integrated information using kernel methods

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
The effect of syntactic representation on semantic role labeling

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
PORE: positive-only relation extraction from wikipedia text

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Modeling relations and their mentions without labeled text

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Scalable knowledge harvesting with high precision and high recall

Proceedings of the fourth ACM international conference on Web search and data mining
An analysis of open information extraction based on semantic role labeling

Proceedings of the sixth international conference on Knowledge capture
Knowledge-based weak supervision for information extraction of overlapping relations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Open information extraction: the second generation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
ClausIE: clause-based open information extraction

Proceedings of the 22nd international conference on World Wide Web
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Human computing games for knowledge acquisition

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Assessing confidence of knowledge base content with an experimental study in entity resolution

Proceedings of the 2013 workshop on Automated knowledge base construction
Knowledge base population and visualization using an ontology based on semantic roles

Proceedings of the 2013 workshop on Automated knowledge base construction
Mining semantics for culturomics: towards a knowledge-based approach

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Summaries on the fly: query-based extraction of structured knowledge from web documents

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, state-of-the-art Open IE systems such as ReVerb and woe share two important weaknesses -- (1) they extract only relations that are mediated by verbs, and (2) they ignore context, thus extracting tuples that are not asserted as factual. This paper presents ollie, a substantially improved Open IE system that addresses both these limitations. First, ollie achieves high yield by extracting relations mediated by nouns, adjectives, and more. Second, a context-analysis step increases precision by including contextual information from the sentence in the extractions. ollie obtains 2.7 times the area under precision-yield curve (AUC) compared to ReVerb and 1.9 times the AUC of woeparse.