Using wiktionary to improve lexical disambiguation in multiple languages

Authors:
Kiem-Hieu Nguyen;Cheol-Young Ock
Affiliations:
School of Electrical Engineering, University of Ulsan, Ulsan, Korea;School of Electrical Engineering, University of Ulsan, Ulsan, Korea
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Year:
2012

Citing 17
Cited 0

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
WordNet: a lexical database for English

Communications of the ACM
Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Korean Part-of-Speech Tagging Using Disambiguation Rules for Ambiguous Word and Statistical information

ICCIT '07 Proceedings of the 2007 International Conference on Convergence Information Technology
Semi-supervised training for the averaged perceptron POS tagger

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Using wiktionary for computing semantic relatedness

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A Hybrid Approach to Vietnamese Word Segmentation Using Part of Speech Tags

KSE '09 Proceedings of the 2009 International Conference on Knowledge and Systems Engineering
Building a large syntactically-annotated corpus of Vietnamese

ACL-IJCNLP '09 Proceedings of the Third Linguistic Annotation Workshop
Wiktionary and NLP: improving synonymy networks

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Using Wikipedia and Wiktionary in domain-specific information retrieval

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Margin perceptron for word sense disambiguation

Proceedings of the 2010 Symposium on Information and Communication Technology
A Semi-supervised Learning Method for Vietnamese Part-of-Speech Tagging

KSE '10 Proceedings of the 2010 Second International Conference on Knowledge and Systems Engineering
Part-of-speech tagging from 97% to 100%: is it time for some linguistics?

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. Dictionaries and subsumption networks are first automatically extracted from Wiktionary. These linguistic resources are then used to enrich the feature set of training examples. A first-order discriminative model is learned on training data using Hidden Markov-Support Vector Machines. The proposed method is competitive with related contemporary works in the three languages. In English, our tagger achieves 96.37% token accuracy on the Brown corpus, with an error reduction of 2.74% over the baseline.