Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Two-stage language models for information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Towards context sensitive information inference
Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Introduction to the special issue on word sense disambiguation: the state of the art
Computational Linguistics - Special issue on word sense disambiguation
Translation Disambiguation in Mixed Language Queries
Machine Translation
Query expansion using term relationships in language models for information retrieval
Proceedings of the 14th ACM international conference on Information and knowledge management
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
ISUC '08 Proceedings of the 2008 Second International Symposium on Universal Communication
Positional language models for information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
SemEval-2007 task 17: English lexical sample, SRL and all words
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
NUS-ML: improving word sense disambiguation using topic features
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval-2010 task: Japanese WSD
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Hi-index | 0.00 |
Word Sense Disambiguation (WSD) often relies on a context model or vector constructed from the words that co-occur with the target word within the same text windows. In most cases, a fixed-sized window is used, which is determined by trial and error. In addition, words within the same window are weighted uniformly regardless to their distance to the target word. Intuitively, it seems more reasonable to assign a stronger weight to context words closer to the target word. However, it is difficult to manually define the optimal weighting function based on distance. In this paper, we propose a unsupervised method for determining the optimal weights for context words according to their distance. The general idea is that the optimal weights should maximize the similarity of two context models of the target word generated from two random samples. This principle is applied to both English and Japanese. The context models using the resulting weights are used in WSD tasks on Semeval data. Our experimental results showed that substantial improvements in WSD accuracy can be obtained using the automatically defined weighting schema.