Statistical dependency parsing in Korean: from corpus generation to automatic parsing

Authors:
Jinho D. Choi;Martha Palmer
Affiliations:
University of Colorado at Boulder;University of Colorado at Boulder
Venue:
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Year:
2011

Citing 12
Cited 0

Handling Stuctural Divergences and Recovering Dropped Arguments in a Korean/English Machine Translation System

AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
MACH: a supersonic Korean morphological analyzer

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Pseudo-projective dependency parsing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
Algorithms for deterministic incremental dependency parsing

Computational Linguistics
The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Reranking the Berkeley and brown parsers

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Factors affecting the accuracy of Korean parsing

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Getting the most out of transition-based dependency parsing

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper gives two contributions to dependency parsing in Korean. First, we build a Korean dependency Treebank from an existing constituent Treebank. For a morphologically rich language like Korean, dependency parsing shows some advantages over constituent parsing. Since there is not much training data available, we automatically generate dependency trees by applying head-percolation rules and heuristics to the constituent trees. Second, we show how to extract useful features for dependency parsing from rich morphology in Korean. Once we build the dependency Treebank, any statistical parsing approach can be applied. The challenging part is how to extract features from tokens consisting of multiple morphemes. We suggest a way of selecting important morphemes and use only these as features to avoid sparsity. Our parsing approach is evaluated on three different genres using both gold-standard and automatic morphological analysis. We also test the impact of fine vs. coarse-grained morphologies on dependency parsing. With automatic morphological analysis, we achieve labeled attachment scores of 80%+. To the best of our knowledge, this is the first time that Korean dependency parsing has been evaluated on labeled edges with such a large variety of data.