Towards an optimal weighting of context words based on distance

Authors:
Bernard Brosseau-Villeneuve;Jian-Yun Nie;Noriko Kando
Affiliations:
Université de Montréal and National Institute of Informatics;Université de Montréal;National Institute of Informatics
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 13
Cited 0

Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Two-stage language models for information retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Towards context sensitive information inference

Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Translation Disambiguation in Mixed Language Queries

Machine Translation
Query expansion using term relationships in language models for information retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Compilation of the Balanced Corpus of Contemporary Written Japanese in the KOTONOHA Initiative (Invited Paper)

ISUC '08 Proceedings of the 2008 Second International Symposium on Universal Communication
Positional language models for information retrieval

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
NUS-ML: improving word sense disambiguation using topic features

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval-2010 task: Japanese WSD

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word Sense Disambiguation (WSD) often relies on a context model or vector constructed from the words that co-occur with the target word within the same text windows. In most cases, a fixed-sized window is used, which is determined by trial and error. In addition, words within the same window are weighted uniformly regardless to their distance to the target word. Intuitively, it seems more reasonable to assign a stronger weight to context words closer to the target word. However, it is difficult to manually define the optimal weighting function based on distance. In this paper, we propose a unsupervised method for determining the optimal weights for context words according to their distance. The general idea is that the optimal weights should maximize the similarity of two context models of the target word generated from two random samples. This principle is applied to both English and Japanese. The context models using the resulting weights are used in WSD tasks on Semeval data. Our experimental results showed that substantial improvements in WSD accuracy can be obtained using the automatically defined weighting schema.