Learning domain differences automatically for dependency parsing adaptation

Authors:
Mo Yu;Tiejun Zhao;Yalong Bai
Affiliations:
Harbin Institute of Technology, China;Harbin Institute of Technology, China;Harbin Institute of Technology, China
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 7
Cited 0

Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
QuestionBank: creating a corpus of parse-annotated questions

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Self-training for biomedical parsing

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Analyzing and integrating dependency parsers

Computational Linguistics
A word clustering approach to domain adaptation: effective parsing of biomedical texts

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the relation between domain differences and domain adaptation for dependency parsing. Our quantitative analyses showed that it is the inconsistent behavior of same features cross-domain, rather than word or feature coverage, that is the major cause of performances decrease of out-domain model. We further studied those ambiguous features in depth and found that the set of ambiguous features is small and has concentric distributions. Based on the analyses, we proposed a DA method. The DA method can automatically learn which features are ambiguous cross domain according to errors made by out-domain model on in-domain training data. Our method is also extended to utilize multiple out-domain models. The results of dependency parser adaptation from WSJ to Genia and Question bank showed that our method achieved significant improvements on small in-domain datasets where DA is mostly in need. Additionally, we achieved improvement on the published best results of CoNLL07 shared task on domain adaptation, which confirms the significance of our analyses and our method.