Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
QuestionBank: creating a corpus of parse-annotated questions
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Self-training for biomedical parsing
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Uptraining for accurate deterministic question parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Analyzing and integrating dependency parsers
Computational Linguistics
A word clustering approach to domain adaptation: effective parsing of biomedical texts
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Hi-index | 0.00 |
In this paper, we address the relation between domain differences and domain adaptation for dependency parsing. Our quantitative analyses showed that it is the inconsistent behavior of same features cross-domain, rather than word or feature coverage, that is the major cause of performances decrease of out-domain model. We further studied those ambiguous features in depth and found that the set of ambiguous features is small and has concentric distributions. Based on the analyses, we proposed a DA method. The DA method can automatically learn which features are ambiguous cross domain according to errors made by out-domain model on in-domain training data. Our method is also extended to utilize multiple out-domain models. The results of dependency parser adaptation from WSJ to Genia and Question bank showed that our method achieved significant improvements on small in-domain datasets where DA is mostly in need. Additionally, we achieved improvement on the published best results of CoNLL07 shared task on domain adaptation, which confirms the significance of our analyses and our method.