Open information extraction for the web

Authors:
Oren Etzioni;Michele Banko
Affiliations:
University of Washington;University of Washington
Venue:
Open information extraction for the web
Year:
2009

Citing 0
Cited 11

Not all seeds are equal: measuring the quality of text mining seeds

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Semantic role labeling for open information extraction

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Human activity mining using conditional radom fields and self-supervised learning

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I
Capturing users' buying activity at Akihabara electric town from twitter

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
An analysis of open information extraction based on semantic role labeling

Proceedings of the sixth international conference on Knowledge capture
Efficient matrix-encoded grammars and low latency parallelization strategies for CYK

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Unsupervised content discovery from concise summaries

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Concept comparison engines: A new frontier of search

Decision Support Systems
SEED: a framework for extracting social events from press news

Proceedings of the 22nd international conference on World Wide Web companion
Methods for exploring and mining tables on Wikipedia

Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Tailoring the automated construction of large-scale taxonomies using the web

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The World Wide Web contains a significant amount of information expressed using natural language. While unstructured text is often difficult for machines to understand, the field of Information Extraction (IE) offers a way to map textual content into a structured knowledge base. The ability to amass vast quantities of information from Web pages has the potential to increase the power with which a modern search engine can answer complex queries. IE has traditionally focused on acquiring knowledge about particular relationships within a small collection of domain-specific text. Typically, a target relation is provided to the system as input along with extraction patterns or examples that have been specified by hand. Shifting to a new relation requires a person to create new patterns or examples. This manual labor scales linearly with the number of relations of interest. The task of extracting information from the Web presents several challenges for existing IE systems. The Web is large and heterogeneous; the number of potentially interesting relations is massive and their identity often unknown. To enable large-scale knowledge acquisition from the Web, this thesis presents Open Information Extraction, a novel extraction paradigm that automatically discovers thousands of relations from unstructured text and readily scales to the size and diversity of the Web.