Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
Combining stochastic and rule-based methods for disambiguation in agglutinative languages
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Tagging inflective languages: prediction of morphological categories for a rich, structured tagset
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Lexicalized hidden Markov models for part-of-speech tagging
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Serial combination of rules and statistics: a case study in Czech tagging
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Building a Korean web corpus for analyzing learner language
WAC-6 '10 Proceedings of the NAACL HLT 2010 Sixth Web as Corpus Workshop
Factors affecting the accuracy of Korean parsing
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Developing methodology for Korean particle error detection
IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Hi-index | 0.00 |
This paper describes a novel approach to morphological tagging for Korean, an agglutinative language with a very productive inflectional system. The tagger takes raw text as input and returns a lemmatized and morphologically disambiguated output for each word: the lemma is labeled with a part-of-speech (POS) tag and the inflections are labeled with inflectional tags. Unlike the standard approach to tagging for morphologically complex languages, in our proposed approach the tagging phase precedes the analysis phase. It comprises a trigram-based tagging component followed by a morphological rule application component, obtaining 95% precision and recall on unseen test data.