Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
An automatic treebank conversion algorithm for corpus sharing
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A second-order Hidden Markov Model for part-of-speech tagging
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A statistical parser for Czech
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Converting dependency structures to phrase structures
HLT '01 Proceedings of the first international conference on Human language technology research
Building a large-scale annotated Chinese corpus
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Learning accurate, compact, and interpretable tree annotation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Part-of-speech tagging of modern hebrew text
Natural Language Engineering
Label correspondence learning for part-of-speech annotation transformation
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting heterogeneous treebanks for parsing
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
K-best combination of syntactic parsers
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Automatic treebank conversion via informed decoding
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A feature-based approach to better automatic treebank conversion
Language Resources and Evaluation
Hi-index | 0.00 |
Treebanks are valuable resources for syntactic parsing. For some languages such as Chinese, we can obtain multiple constituency treebanks which are developed by different organizations. However, due to discrepancies of underlying annotation standards, such treebanks in general cannot be used together through direct data combination. To enlarge training data for syntactic parsing, we focus in this article on the challenge of unifying standards of disparate treebanks by automatically converting one treebank (source treebank) to fit a different standard which is exhibited by another treebank (target treebank). We propose to convert a treebank in two sequential steps which correspond to the part-of-speech level and syntactic structure level (including tree structures and grammar labels), respectively. Approaches used in both levels can be unified as an informed decoding procedure, where information derived from original annotation in a source treebank is used to guide the conversion conducted by a POS tagger (or a parser in the syntactic structure level) trained on a target treebank. We take two Chinese treebanks as a case study, and experiments on these two treebanks show significant improvements in conversion accuracy over baseline systems, especially in situations where a target treebank is small in size.