Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
The Penn Chinese TreeBank: Phrase structure annotation of a large corpus
Natural Language Engineering
The LinGO Redwoods treebank motivation and preliminary applications
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Applying co-training methods to statistical parsing
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Two statistical parsing models applied to the Chinese Treebank
CLPW '00 Proceedings of the second workshop on Chinese language processing: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 12
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Chinese word segmentation as LMR tagging
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Incremental parsing with the perceptron algorithm
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Adaptive Chinese word segmentation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Hierarchical Phrase-Based Translation
Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank
Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Parsing the penn chinese treebank with semantic knowledge
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Label correspondence learning for part-of-speech annotation transformation
Proceedings of the 18th ACM conference on Information and knowledge management
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Dependency parsing and projection based on word-pair classification
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improved unsupervised POS induction through prototype discovery
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Word-based and character-based word segmentation models: comparison and combination
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Parsing the internal structure of words: a new paradigm for Chinese word segmentation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Better automatic treebank conversion using a feature-based approach
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks
ACM Transactions on Asian Language Information Processing (TALIP)
Semi-supervised Learning Framework for Cross-Lingual Projection
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Enhancing Chinese word segmentation using unlabeled data
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Exploiting multiple treebanks for parsing with quasi-synchronous grammars
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Joint Chinese word segmentation, POS tagging and parsing
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Word segmentation, unknown-word resolution, and morphological agreement in a hebrew parsing system
Computational Linguistics
A feature-based approach to better automatic treebank conversion
Language Resources and Evaluation
Hi-index | 0.00 |
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently annotated corpus to the corpus with desired annotation. We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. Experiments show that adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2% and 14%, respectively), which in turn helps improve Chinese parsing accuracy.