Unsupervised discovery of domain-specific knowledge from text

Authors:
Dirk Hovy;Chunliang Zhang;Eduard Hovy;Anselmo Peñas
Affiliations:
University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;University of Southern California, Marina del Rey, CA;UNED NLP and IR Group, Juan del Rosal, Madrid, Spain
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 12
Cited 4

Automatic labeling of semantic roles

Computational Linguistics
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Maximum entropy models for FrameNet classification

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Knowledge-rich Word Sense Disambiguation rivaling supervised systems

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Semantic enrichment of text with background knowledge

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
PRISMATIC: inducing knowledge from a large scale lexicalized relation resource

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Unsupervised learning of verb argument structures

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing

Recall-oriented learning of named entities in Arabic Wikipedia

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Dependency-based open information extraction

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Large-Scale cost-based abduction in full-fledged first-order predicate logic with cutting plane inference

JELIA'12 Proceedings of the 13th European conference on Logics in Artificial Intelligence
Discriminative learning of first-order weighted abduction from partial discourse explanations

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning by Reading (LbR) aims at enabling machines to acquire knowledge from and reason about textual input. This requires knowledge about the domain structure (such as entities, classes, and actions) in order to do inference. We present a method to infer this implicit knowledge from unlabeled text. Unlike previous approaches, we use automatically extracted classes with a probability distribution over entities to allow for context-sensitive labeling. From a corpus of 1.4m sentences, we learn about 250k simple propositions about American football in the form of predicate-argument structures like "quarterbacks throw passes to receivers". Using several statistical measures, we show that our model is able to generalize and explain the data statistically significantly better than various baseline approaches. Human subjects judged up to 96.6% of the resulting propositions to be sensible. The classes and probabilistic model can be used in textual enrichment to improve the performance of LbR end-to-end systems.