WordNet: a lexical database for English
Communications of the ACM
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
ICCIT '07 Proceedings of the 2007 International Conference on Convergence Information Technology
Semi-supervised training for the averaged perceptron POS tagger
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
International Journal of Human-Computer Studies
Using wiktionary for computing semantic relatedness
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A Hybrid Approach to Vietnamese Word Segmentation Using Part of Speech Tags
KSE '09 Proceedings of the 2009 International Conference on Knowledge and Systems Engineering
Building a large syntactically-annotated corpus of Vietnamese
ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Wiktionary and NLP: improving synonymy networks
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Using Wikipedia and Wiktionary in domain-specific information retrieval
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Margin perceptron for word sense disambiguation
Proceedings of the 2010 Symposium on Information and Communication Technology
A Semi-supervised Learning Method for Vietnamese Part-of-Speech Tagging
KSE '10 Proceedings of the 2010 Second International Conference on Knowledge and Systems Engineering
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Hi-index | 0.00 |
This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. Dictionaries and subsumption networks are first automatically extracted from Wiktionary. These linguistic resources are then used to enrich the feature set of training examples. A first-order discriminative model is learned on training data using Hidden Markov-Support Vector Machines. The proposed method is competitive with related contemporary works in the three languages. In English, our tagger achieves 96.37% token accuracy on the Brown corpus, with an error reduction of 2.74% over the baseline.