Low-Cost Supervision for Multiple-Source Attribute Extraction

Authors:
Joseph Reisinger;Marius Paşca
Affiliations:
University of Texas at Austin, Austin,;Google Inc., Mountain View, California
Venue:
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 20
Cited 1

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Concept discovery from text

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Evaluating answers to definition questions

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A bootstrapping method for learning semantic lexicons using extraction pattern contexts

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity discovery using comparable news articles

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
The role of documents vs. queries in extracting class attributes from text

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Using structured text for large-scale attribute extraction

Proceedings of the 17th ACM conference on Information and knowledge management
Turing's dream and the knowledge challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Boosting unsupervised relation extraction by using NER

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
An analysis of knowledge collected from volunteer contributors

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Semi-supervised learning of attribute-value pairs from product descriptions

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatic discovery of attribute words from web documents

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

A Formal Knowledge Representation System FKRS for the Intelligent Knowledge Base of a Cognitive Learning Engine

International Journal of Software Science and Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we introduce Bootstrapped Web Search (BWS) extraction, the first approach to extracting class attributes simultaneously from both sources. Extraction is guided by a small set of seed attributes and does not rely on further domain-specific knowledge. BWS is shown to improve extraction precision and also to improve attribute relevance across 40 test classes.