Similarity-based methods for word sense disambiguation

Authors:
Ido Dagan;Lillian Lee;Fernando Pereira
Affiliations:
Bar Ilan University, Ramat Gan, Israel;Harvard University, Cambridge, MA;AT&T Labs-Research, Murray Hill, NJ
Venue:
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Year:
1997

Citing 7
Cited 32

Self-organized language modeling for speech recognition

Readings in speech recognition
Elements of information theory

Elements of information theory
Class-based n-gram models of natural language

Computational Linguistics
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A class-based approach to lexical discovery

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics

Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Corpus-based learning of semantic relations by the ILP system, Asium

Learning language in logic
Technique for eliminating irrelevant terms in term rewriting for annotated media retrieval

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Collocation Dictionary Optimization Using WordNetand k-Nearest Neighbor Learning

Machine Translation
Verb sense disambiguation based on dual distributional similarity

Natural Language Engineering
Automatic verb classification using distributions of grammatical features

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Improving query translation in English-Korean cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
LaTaT: language and text analysis tools

HLT '01 Proceedings of the first international conference on Human language technology research
Combining optimal clustering and Hidden Markov models for extractive summarization

MultiSumQA '03 Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering - Volume 12
A computer-aided environment for generating multiple-choice test items

Natural Language Engineering
One story, one flow: Hidden Markov Story Models for multilingual multidocument summarization

ACM Transactions on Speech and Language Processing (TSLP)
Exploring distributional similarity based models for query spelling correction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Feature weighting for co-occurrence-based classification of words

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic acquisition for sensibility knowledge using co-occurrence relation

International Journal of Computer Applications in Technology
Combining Language Modeling and Discriminative Classification for Word Segmentation

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Abstraction is harmful in language learning

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Comparison of similarity models for the relation discovery task

LD '06 Proceedings of the Workshop on Linguistic Distances
Semantic similarity of distractors in multiple-choice tests: extrinsic evaluation

GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Correction of medical handwriting OCR based on semantic similarity

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
A Bayesian method for robust estimation of distributional similarities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SemEval-2010 task: Japanese WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
A semantic similarity approach to predicting Library of Congress subject headings for social tags

Journal of the American Society for Information Science and Technology
Weakly supervised morphology learning for agglutinating languages using small training sets

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Unsupervised morpheme discovery with ungrade

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Unsupervised extraction of keywords from news archives

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Characterizing web content, user interests, and search behavior by reading level and topic

Proceedings of the fifth ACM international conference on Web search and data mining
Semantics-based event-driven web news classification

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A language modeling approach for extracting translation knowledge from comparable corpora

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Experiments with semantic similarity measures based on LDA and LSA

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates.