AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
MACH: a supersonic Korean morphological analyzer
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Pseudo-projective dependency parsing
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A dual coordinate descent method for large-scale linear SVM
Proceedings of the 25th international conference on Machine learning
Algorithms for deterministic incremental dependency parsing
Computational Linguistics
The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
The CoNLL-2009 shared task: syntactic and semantic dependencies in multiple languages
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task
Reranking the Berkeley and brown parsers
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Factors affecting the accuracy of Korean parsing
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Getting the most out of transition-based dependency parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Hi-index | 0.00 |
This paper gives two contributions to dependency parsing in Korean. First, we build a Korean dependency Treebank from an existing constituent Treebank. For a morphologically rich language like Korean, dependency parsing shows some advantages over constituent parsing. Since there is not much training data available, we automatically generate dependency trees by applying head-percolation rules and heuristics to the constituent trees. Second, we show how to extract useful features for dependency parsing from rich morphology in Korean. Once we build the dependency Treebank, any statistical parsing approach can be applied. The challenging part is how to extract features from tokens consisting of multiple morphemes. We suggest a way of selecting important morphemes and use only these as features to avoid sparsity. Our parsing approach is evaluated on three different genres using both gold-standard and automatic morphological analysis. We also test the impact of fine vs. coarse-grained morphologies on dependency parsing. With automatic morphological analysis, we achieve labeled attachment scores of 80%+. To the best of our knowledge, this is the first time that Korean dependency parsing has been evaluated on labeled edges with such a large variety of data.