Extracting sequences from the web

Authors:
Anthony Fader;Stephen Soderland;Oren Etzioni
Affiliations:
University of Washington, Seattle;University of Washington, Seattle;University of Washington, Seattle
Venue:
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Year:
2010

Citing 10
Cited 0

Machine Learning for Information Extraction in Informal Domains

Machine Learning - Special issue on information retrieval
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
On-demand information extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Language-Independent Set Expansion of Named Entities Using the Web

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Methods for domain-independent information extraction from the web: an experimental comparison

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classical Information Extraction (IE) systems fill slots in domain-specific frames. This paper reports on SEQ, a novel open IE system that leverages a domain-independent frame to extract ordered sequences such as presidents of the United States or the most common causes of death in the U.S. SEQ leverages regularities about sequences to extract a coherent set of sequences from Web text. SEQ nearly doubles the area under the precision-recall curve compared to an extractor that does not exploit these regularities.