Is it harder to parse Chinese, or the Chinese Treebank?

Authors:
Roger Levy;Christopher Manning
Affiliations:
Stanford University;Stanford University
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Year:
2003

Citing 10
Cited 51

Exploiting diversity for natural language processing

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
PCFG models of linguistic tree representations

Computational Linguistics
Coping with syntactic ambiguity or how to put the block in the box on the table

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Compacting the Penn Treebank grammar

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Building a large-scale annotated Chinese corpus

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Two statistical parsing models applied to the Chinese Treebank

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12

A maximum-entropy chinese parser augmented by transformation-based learning

ACM Transactions on Asian Language Information Processing (TALIP)
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Lexicalization in crosslinguistic probabilistic parsing: the case of French

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
What to do when lexicalization fails: parsing German with suffix analysis and smoothing

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A fast, accurate deterministic parser for Chinese

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Annotation strategies for probabilistic parsing in German

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Morphology and reranking for the statistical parsing of Spanish

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Getting started on natural language processing with Python

Crossroads
Dependency parsing of turkish

Computational Linguistics
Improved Monolingual Hypothesis Alignment for Machine Translation System Combination

ACM Transactions on Asian Language Information Processing (TALIP)
Is it really that difficult to parse German?

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Revisiting the impact of different annotation schemes on PCFG parsing: a grammatical dependency evaluation

PaGe '08 Proceedings of the Workshop on Parsing German
Multiple reorderings in phrase-based machine translation

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Chinese syntactic reordering for adequate generation of Korean verbal phrases in Chinese-to-Korean SMT

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Disambiguating "DE" for Chinese-English machine translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Strictly lexical dependency parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Exploiting heterogeneous treebanks for parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Transition-based parsing of the Chinese treebank using a global discriminative model

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Self-training PCFG grammars with latent annotations across languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Syntactic parsing with hierarchical modeling

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
A direct syntax-driven reordering model for phrase-based machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
PKU_HIT: An event detection system based on instances expansion and rich syntactic features

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
A discriminative latent variable-based "DE" classifier for Chinese--English SMT

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Better Arabic parsing: baselines, evaluations, and analysis

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A semantic analyzer for aiding emotion recognition in Chinese

ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Combining constituent and dependency syntactic views for Chinese semantic role labeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Phrase structure parsing with dependency structure

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Phrasal equivalence classes for generalized corpus-based machine translation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Syntactic processing using the generalized perceptron and beam search

Computational Linguistics
SEEN: a semantic dependency analyzer for Chinese

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Semantic information and derivation rules for robust dialogue act detection in a spoken dialogue system

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Improved Chinese--English SMT with Chinese “DE” Construction Classification and Reordering

ACM Transactions on Asian Language Information Processing (TALIP)
A decoding method of system combination based on hypergraph in SMT

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
The incremental use of morphological information and lexicalization in data-driven dependency parsing

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Lexicalized beam thresholding parsing with prior and boundary estimates

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Quasi-synchronous phrase dependency grammars for machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint models for Chinese POS tagging and dependency parsing

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parsing the penn chinese treebank with semantic knowledge

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Syntax augmented inversion transduction grammars for machine translation

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Analysis of the difficulties in Chinese deep parsing

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Feature analysis of Chinese textual entailment system

ROCLING '11 ROCLING 2011 Poster Papers
The challenges of parsing Chinese with combinatory categorial grammar

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
An exploration of forest-to-string translation: does translation help or hurt parsing?

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Employing compositional semantics and discourse consistency in Chinese event extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Chinese coreference resolution via ordered filtering

CoNLL '12 Joint Conference on EMNLP and CoNLL - Shared Task
A machine-learning framework for hybrid machine translation

KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
Parsing models for identifying multiword expressions

Computational Linguistics
Chinese syntactic parsing based on linguistic entity-relationship model

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving function word alignment with frequency and syntactic information

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Using compositional semantics and discourse consistency to improve Chinese trigger identification

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a detailed investigation of the challenges posed when applying parsing models developed against English corpora to Chinese. We develop a factored-model statistical parser for the Penn Chinese Treebank, showing the implications of gross statistical differences between WSJ and Chinese Tree-banks for the most general methods of parser adaptation. We then provide a detailed analysis of the major sources of statistical parse errors for this corpus, showing their causes and relative frequencies, and show that while some types of errors are due to difficult ambiguities inherent in Chinese grammar, others arise due to treebank annotation practices. We show how each type of error can be addressed with simple, targeted changes to the independence assumptions of the maximum likelihood-estimated PCFG factor of the parsing model, which raises our F1 from 80.7% to 82.6% on our development set, and achieves parse accuracy close to the best published figures for Chinese parsing.