Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
An automatic treebank conversion algorithm for corpus sharing
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
On the parameter space of generative lexicalized statistical parsing models
On the parameter space of generative lexicalized statistical parsing models
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Incremental parsing with the perceptron algorithm
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A best-first probabilistic shift-reduce parser
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parser combination by reparsing
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Exploiting heterogeneous treebanks for parsing
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Transition-based parsing of the Chinese treebank using a global discriminative model
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Improving dependency parsing with subtrees from auto-parsed data
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Uptraining for accurate deterministic question parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Better automatic treebank conversion using a feature-based approach
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
In the field of constituency parsing, there exist multiple human-labeled treebanks which are built on non-overlapping text samples and follow different annotation standards. Due to the extreme cost of annotating parse trees by human, it is desirable to automatically convert one treebank (called source treebank) to the standard of another treebank (called target treebank) which we are interested in. Conversion results can be manually corrected to obtain higher-quality annotations or can be directly used as additional training data for building syntactic parsers. To perform automatic treebank conversion, we divide constituency parses into two separate levels: the part-of-speech (POS) and syntactic structure (bracketing structures and constituent labels), and conduct conversion on these two levels respectively with a feature-based approach. The basic idea of the approach is to encode original annotations in a source treebank as guide features during the conversion process. Experiments on two Chinese treebanks show that our approach can convert POS tags and syntactic structures with the accuracy of 96.6 and 84.8 %, respectively, which are the best reported results on this task.