The syntactic process
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
Investigating GIS and smoothing for maximum entropy taggers
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Recovering latent information in treebanks
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
On the parameter space of generative lexicalized statistical parsing models
On the parameter space of generative lexicalized statistical parsing models
Two statistical parsing models applied to the Chinese Treebank
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Comparing Lexicalized Treebank Grammars extracted from Chinese, Korean, and English corpora
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
A fast, accurate deterministic parser for Chinese
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models
Computational Linguistics
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Transition-based parsing of the Chinese treebank using a global discriminative model
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Unbounded dependency recovery for parser evaluation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Bilingually-constrained (monolingual) shift-reduce parsing
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Accurate context-free parsing with combinatory categorial grammar
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Analysis of the difficulties in Chinese deep parsing
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Hi-index | 0.00 |
We apply Combinatory Categorial Grammar to wide-coverage parsing in Chinese with the new Chinese CCGbank, bringing a formalism capable of transparently recovering non-local dependencies to a language in which they are particularly frequent. We train two state-of-the-art English CCG parsers: the parser of Petrov and Klein (P&K), and the Clark and Curran (C&C) parser, uncovering a surprising performance gap between them not observed in English --- 72.73 (P&K) and 67.09 (C&C) F-score on PCTB 6. We explore the challenges of Chinese CCG parsing through three novel ideas: developing corpus variants rather than treating the corpus as fixed; controlling noun/verb and other POS ambiguities; and quantifying the impact of constructions like pro-drop.