Statistical dependency parsing in Korean: from corpus generation to automatic parsing

  • Authors:
  • Jinho D. Choi;Martha Palmer

  • Affiliations:
  • University of Colorado at Boulder;University of Colorado at Boulder

  • Venue:
  • SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper gives two contributions to dependency parsing in Korean. First, we build a Korean dependency Treebank from an existing constituent Treebank. For a morphologically rich language like Korean, dependency parsing shows some advantages over constituent parsing. Since there is not much training data available, we automatically generate dependency trees by applying head-percolation rules and heuristics to the constituent trees. Second, we show how to extract useful features for dependency parsing from rich morphology in Korean. Once we build the dependency Treebank, any statistical parsing approach can be applied. The challenging part is how to extract features from tokens consisting of multiple morphemes. We suggest a way of selecting important morphemes and use only these as features to avoid sparsity. Our parsing approach is evaluated on three different genres using both gold-standard and automatic morphological analysis. We also test the impact of fine vs. coarse-grained morphologies on dependency parsing. With automatic morphological analysis, we achieve labeled attachment scores of 80%+. To the best of our knowledge, this is the first time that Korean dependency parsing has been evaluated on labeled edges with such a large variety of data.