Unsupervised acquisition of predominant word senses

Authors:
Diana McCarthy;Rob Koeling;Julie Weeds;John Carroll
Affiliations:
-;-;-;-
Venue:
Computational Linguistics
Year:
2007

Citing 0
Cited 23

Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Lexical patterns or dependency patterns: which is better for hypernym extraction?

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Measuring topic homogeneity and its application to dictionary-based word sense disambiguation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Disambiguating Tags in Blogs

TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Refining the most frequent sense baseline

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
From predicting predominant senses to local context for word sense disambiguation

STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Relieving Polysemy Problem for Synonymy Detection

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
A Reexamination of MRD-Based Word Sense Disambiguation

ACM Transactions on Asian Language Information Processing (TALIP)
All words domain adapted WSD: finding a middle ground between supervision and unsupervision

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SemEval-2010 task 17: All-words word sense disambiguation on a specific domain

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
IIITH: Domain specific word sense disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
HIT-CIR: An unsupervised WSD system based on domain most frequent sense estimation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Kyoto: An integrated system for specific domain WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
CFILT: Resource conscious approaches for all-words domain specific WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Identification of domain-specific senses in a machine-readable dictionary

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Two birds with one stone: learning semantic models for text categorization and word sense disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
A quick tour of word sense disambiguation, induction and related approaches

SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Lexical acquisition for clinical text mining using distributional similarity

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Natural language technology and query expansion: issues, state-of-the-art and perspectives

Journal of Intelligent Information Systems
A new minimally-supervised framework for domain word sense disambiguation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Measuring website similarity using an entity-aware click graph

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of “star” for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.