Exploiting heterogeneous treebanks for parsing

Authors:
Zheng-Yu Niu;Haifeng Wang;Hua Wu
Affiliations:
Toshiba (China) Reseach and Development Center, Beijing, China;Toshiba (China) Reseach and Development Center, Beijing, China;Toshiba (China) Reseach and Development Center, Beijing, China
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Year:
2009

Citing 18
Cited 7

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An automatic treebank conversion algorithm for corpus sharing

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A statistical parser for Czech

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
Converting dependency structures to phrase structures

HLT '01 Proceedings of the first international conference on Human language technology research
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Supervised and unsupervised PCFG adaptation to novel domains

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Is it harder to parse Chinese, or the Chinese Treebank?

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
On the parameter space of generative lexicalized statistical parsing models

On the parameter space of generative lexicalized statistical parsing models
Two statistical parsing models applied to the Chinese Treebank

CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
Translating treebank annotation for evaluation

ELDS '01 Proceedings of the workshop on Evaluation for Language and Dialogue Systems - Volume 9
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Reranking and self-training for parser adaptation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A fast, accurate deterministic parser for Chinese

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Two languages are better than one (for syntactic parsing)

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parsing the penn chinese treebank with semantic knowledge

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Heterogeneous parsing via collaborative decoding

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Automatic treebank conversion via informed decoding

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Better automatic treebank conversion using a feature-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks

ACM Transactions on Asian Language Information Processing (TALIP)
Reducing approximation and estimation errors for Chinese lexical processing with heterogeneous annotations

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Exploiting multiple treebanks for parsing with quasi-synchronous grammars

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A feature-based approach to better automatic treebank conversion

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks. First we propose to employ an iteratively trained target grammar parser to perform grammar formalism conversion, eliminating predefined heuristic rules as required in previous methods. Then we provide two strategies to refine conversion results, and adopt a corpus weighting technique for parsing on homogeneous treebanks. Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result. Evaluation on the Penn Chinese Treebank indicates that a converted dependency treebank helps constituency parsing and the use of unlabeled data by self-training further increases parsing f-score to 85.2%, resulting in 6% error reduction over the previous best result.