A feature-based approach to better automatic treebank conversion

Authors:
Muhua Zhu;Jingbo Zhu;Huizhen Wang
Affiliations:
Natural Language Processing Laboratory, Northeastern University, Shenyang, China;Natural Language Processing Laboratory, Northeastern University, Shenyang, China;Natural Language Processing Laboratory, Northeastern University, Shenyang, China
Venue:
Language Resources and Evaluation
Year:
2013

Citing 21
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
An automatic treebank conversion algorithm for corpus sharing

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

Natural Language Engineering
On the parameter space of generative lexicalized statistical parsing models

On the parameter space of generative lexicalized statistical parsing models
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A best-first probabilistic shift-reduce parser

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Stacking dependency parsers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Parser combination by reparsing

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Domain adaptation for statistical classifiers

Journal of Artificial Intelligence Research
Exploiting heterogeneous treebanks for parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatic adaptation of annotation standards: Chinese word segmentation and POS tagging: a case study

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Automatic adaptation of annotation standards for dependency parsing: using projected treebank as source corpus

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Transition-based parsing of the Chinese treebank using a global discriminative model

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Improving dependency parsing with subtrees from auto-parsed data

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Better automatic treebank conversion using a feature-based approach

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the field of constituency parsing, there exist multiple human-labeled treebanks which are built on non-overlapping text samples and follow different annotation standards. Due to the extreme cost of annotating parse trees by human, it is desirable to automatically convert one treebank (called source treebank) to the standard of another treebank (called target treebank) which we are interested in. Conversion results can be manually corrected to obtain higher-quality annotations or can be directly used as additional training data for building syntactic parsers. To perform automatic treebank conversion, we divide constituency parses into two separate levels: the part-of-speech (POS) and syntactic structure (bracketing structures and constituent labels), and conduct conversion on these two levels respectively with a feature-based approach. The basic idea of the approach is to encode original annotations in a source treebank as guide features during the conversion process. Experiments on two Chinese treebanks show that our approach can convert POS tags and syntactic structures with the accuracy of 96.6 and 84.8 %, respectively, which are the best reported results on this task.