NUS-ML: improving word sense disambiguation using topic features

Authors:
Jun Fu Cai;Wee Sun Lee;Yee Whye Teh
Affiliations:
National University of Singapore, Singapore;National University of Singapore, Singapore;University College London, London, UK
Venue:
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Year:
2007

Citing 4
Cited 14

The nature of statistical learning theory

The nature of statistical learning theory
Latent dirichlet allocation

The Journal of Machine Learning Research
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
An investigation into feature construction to assist word sense disambiguation

Machine Learning
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Employing topic models for pattern-based semantic class discovery

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Evaluating word sense disambiguation tools for information retrieval task

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
KSU KDD: Word sense induction by clustering in topic space

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
RALI: Automatic weighting of text window distances

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
MSS: Investigating the effectiveness of domain combinations and topic features for word sense disambiguation

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Towards an optimal weighting of context words based on distance

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Topic models for meaning similarity in context

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A part-of-speech lexicographic encoding for an evolutionary word sense disambiguation approach

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
A supervised method for lexical annotation of schema labels based on wikipedia

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
Creating a system for lexical substitutions from scratch using crowdsourcing

Language Resources and Evaluation
Latent word context model for information retrieval

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We participated in SemEval-1 English coarse-grained all-words task (task 7), English fine-grained all-words task (task 17, subtask 3) and English coarse-grained lexical sample task (task 17, subtask 1). The same method with different labeled data is used for the tasks; SemCor is the labeled corpus used to train our system for the all-words tasks while the labeled corpus that is provided is used for the lexical sample task. The knowledge sources include part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic patterns. In addition, we constructed a topic feature, targeted to capture the global context information, using the latent dirichlet allocation (LDA) algorithm with unlabeled corpus. A modified naïve Bayes classifier is constructed to incorporate all the features. We achieved 81.6%, 57.6%, 88.7% for coarse-grained all-words task, fine-grained all-words task and coarse-grained lexical sample task respectively.