Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Lexical patterns or dependency patterns: which is better for hypernym extraction?
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Measuring topic homogeneity and its application to dictionary-based word sense disambiguation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Using web-search results to measure word-group similarity
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
TSD '09 Proceedings of the 12th International Conference on Text, Speech and Dialogue
Refining the most frequent sense baseline
DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
From predicting predominant senses to local context for word sense disambiguation
STEP '08 Proceedings of the 2008 Conference on Semantics in Text Processing
Relieving Polysemy Problem for Synonymy Detection
EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
A Reexamination of MRD-Based Word Sense Disambiguation
ACM Transactions on Asian Language Information Processing (TALIP)
All words domain adapted WSD: finding a middle ground between supervision and unsupervision
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
SemEval-2010 task 17: All-words word sense disambiguation on a specific domain
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
IIITH: Domain specific word sense disambiguation
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
HIT-CIR: An unsupervised WSD system based on domain most frequent sense estimation
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Kyoto: An integrated system for specific domain WSD
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
CFILT: Resource conscious approaches for all-words domain specific WSD
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Identification of domain-specific senses in a machine-readable dictionary
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Proceedings of the 20th ACM international conference on Information and knowledge management
A quick tour of word sense disambiguation, induction and related approaches
SOFSEM'12 Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science
Lexical acquisition for clinical text mining using distributional similarity
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Natural language technology and query expansion: issues, state-of-the-art and perspectives
Journal of Intelligent Information Systems
A new minimally-supervised framework for domain word sense disambiguation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Measuring website similarity using an entity-aware click graph
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
There has been a great deal of recent research into word sense disambiguation, particularly since the inception of the Senseval evaluation exercises. Because a word often has more than one meaning, resolving word sense ambiguity could benefit applications that need some level of semantic interpretation of language input. A major problem is that the accuracy of word sense disambiguation systems is strongly dependent on the quantity of manually sense-tagged data available, and even the best systems, when tagging every word token in a document, perform little better than a simple heuristic that guesses the first, or predominant, sense of a word in all contexts. The success of this heuristic is due to the skewed nature of word sense distributions. Data for the heuristic can come from either dictionaries or a sample of sense-tagged data. However, there is a limited supply of the latter, and the sense distributions and predominant sense of a word can depend on the domain or source of a document. (The first sense of “star” for example would be different in the popular press and scientific journals). In this article, we expand on a previously proposed method for determining the predominant sense of a word automatically from raw text. We look at a number of different data sources and parameterizations of the method, using evaluation results and error analyses to identify where the method performs well and also where it does not. In particular, we find that the method does not work as well for verbs and adverbs as nouns and adjectives, but produces more accurate predominant sense information than the widely used SemCor corpus for nouns with low coverage in that corpus. We further show that the method is able to adapt successfully to domains when using domain specific corpora as input and where the input can either be hand-labeled for domain or automatically classified.