Topic models for word sense disambiguation and token-based idiom detection

Authors:
Linlin Li;Benjamin Roth;Caroline Sporleder
Affiliations:
Saarland University, Saarbrücken, Germany;Saarland University, Saarbrücken, Germany;Saarland University, Saarbrücken, Germany
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 22
Cited 7

Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images

Readings in computer vision: issues, problems, principles, and paradigms
WordNet: a lexical database for English

Communications of the ACM
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Meaningful clustering of senses helps boost word sense disambiguation performance

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Evaluating the accuracy of an unlexicalized statistical parser on the PARC DepBank

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications

AAIM '09 Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised recognition of literal and non-literal use of idiomatic expressions

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
SemEval-2007 task 07: coarse-grained English all-words task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
NUS-PT: exploiting parallel texts for word sense disambiguation in the English all-words tasks

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
PUTOP: turning predominant senses into a topic model for word sense disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
RACAI: meaning affinity models

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
TKB-UO: using sense clustering for WSD

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
UPV-WSD: combining different WSD methods by means of fuzzy Borda voting

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Discourse topics and metaphors

CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Topic model analysis of metaphor frequency for psycholinguistic stimuli

CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Classifier combination for contextual idiom detection without labelled data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1

Nonparametric Bayesian word sense induction

TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
Exploring supervised lda models for assigning attributes to adjective-noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Semantic topic models: combining word distributional statistics and dictionary definitions

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probabilistic models of similarity in syntactic context

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Generating diagnostic multiple choice comprehension cloze questions

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Learning the latent semantics of a concept from its definition

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Discovering coherent topics using general knowledge

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. non-literal usages of potentially idiomatic expressions. In all three cases, we outperform state-of-the-art systems either quantitatively or statistically significantly.