PengYuan@PKU: Extracting infrequent sense instance with the same N-gram pattern for the SemEval-2010 task 15

Authors:
Peng-Yuan Liu;Shui Liu;Shi-Wen Yu;Tie-Jun Zhao
Affiliations:
Peking University, Beijing, China;Harbin Institute of Technology, Harbin, China;Peking University, Beijing, China;Harbin Institute of Technology, Harbin, China
Venue:
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Year:
2010

Citing 7
Cited 0

An automatic method for generating sense tagged corpora

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
An Iterative Approach to Word Sense Disambiguation

Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Exploiting parallel texts for word sense disambiguation: an empirical study

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
One sense per collocation

HLT '93 Proceedings of the workshop on Human Language Technology
A chinese corpus with word sense annotation

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our infrequent sense identification system participating in the SemEval-2010 task 15 on Infrequent Sense Identification for Mandarin Text to Speech Systems. The core system is a supervised system based on the ensembles of Naïve Bayesian classifiers. In order to solve the problem of unbalanced sense distribution, we intentionally extract only instances of infrequent sense with the same N-gram pattern as the complement training data from an untagged Chinese corpus -- People's Daily of the year 2001. At the same time, we adjusted the prior probability to adapt to the distribution of the test data and tuned the smoothness coefficient to take the data sparseness into account. Official result shows that, our system ranked the first with the best Macro Accuracy 0.952. We briefly describe this system, its configuration options and the features used for this task and present some discussion of the results.