NUS-ML: improving word sense disambiguation using topic features

  • Authors:
  • Jun Fu Cai;Wee Sun Lee;Yee Whye Teh

  • Affiliations:
  • National University of Singapore, Singapore;National University of Singapore, Singapore;University College London, London, UK

  • Venue:
  • SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We participated in SemEval-1 English coarse-grained all-words task (task 7), English fine-grained all-words task (task 17, subtask 3) and English coarse-grained lexical sample task (task 17, subtask 1). The same method with different labeled data is used for the tasks; SemCor is the labeled corpus used to train our system for the all-words tasks while the labeled corpus that is provided is used for the lexical sample task. The knowledge sources include part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic patterns. In addition, we constructed a topic feature, targeted to capture the global context information, using the latent dirichlet allocation (LDA) algorithm with unlabeled corpus. A modified naïve Bayes classifier is constructed to incorporate all the features. We achieved 81.6%, 57.6%, 88.7% for coarse-grained all-words task, fine-grained all-words task and coarse-grained lexical sample task respectively.