Factors affecting the accuracy of Korean parsing

Authors:
Tagyoung Chung;Matt Post;Daniel Gildea
Affiliations:
University of Rochester, Rochester, NY;University of Rochester, Rochester, NY;University of Rochester, Rochester, NY
Venue:
SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Year:
2010

Citing 17
Cited 6

PCFG models of linguistic tree representations

Computational Linguistics
Rapid parser development: a machine learning approach for Korean

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A Morphological Tagger for Korean: Statistical Tagging Combined with Corpus-Based Morphological Rule Application

Machine Translation
What is the minimal set of fragments that achieves maximal parse accuracy?

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Hidden Markov model-based Korean part-of-speech tagging considering high agglutinativity, word-spacing, and lexical correlativity

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Statistical parsing with an automatically-extracted tree adjoining grammar

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Korean zero pronouns: analysis and resolution

Korean zero pronouns: analysis and resolution
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Learning accurate, compact, and interpretable tree annotation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Unsupervised methods for head assignments

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Inducing compact but accurate tree-substitution grammars

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bayesian learning of a tree substitution grammar

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Tree-bank grammars

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Inducing head-driven PCFGs with latent heads: refining a tree-bank grammar for parsing

ECML'05 Proceedings of the 16th European conference on Machine Learning

Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

SPMRL '10 Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages
Developing methodology for Korean particle error detection

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Statistical dependency parsing in Korean: from corpus generation to automatic parsing

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Toward Tree Substitution Grammars with latent annotations

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

ACM Transactions on Asian Language Information Processing (TALIP)
Annotating korean text documents with linked data resources

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate parsing accuracy on the Korean Treebank 2.0 with a number of different grammars. Comparisons among these grammars and to their English counterparts suggest different aspects of Korean that contribute to parsing difficulty. Our results indicate that the coarseness of the Treebank's nonterminal set is a even greater problem than in the English Treebank. We also find that Korean's relatively free word order does not impact parsing results as much as one might expect, but in fact the prevalence of zero pronouns accounts for a large portion of the difference between Korean and English parsing scores.