Foundations of statistical natural language processing
Foundations of statistical natural language processing
The Journal of Machine Learning Research
The domain dependence of parsing
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
More accurate tests for the statistical significance of result differences
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Sample Selection for Statistical Parsing
Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Domain adaptation with structural correspondence learning
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Automatic prediction of parser accuracy
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Genre distinctions for discourse in the Penn TreeBank
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatic domain adaptation for parsing
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Ensemble models for dependency parsing: cheap and good?
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Fine-grained genre classification using structural learning algorithms
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Intelligent selection of language model training data
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Grammar-driven versus data-driven: which parsing system is more affected by domain shifts?
NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Using domain similarity for performance estimation
DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Uptraining for accurate deterministic question parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Exploring variations across biomedical subdomains
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Adapting a probabilistic disambiguation model of an HPSG parser to a new domain
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Sentence-level instance-weighting for graph-based and transition-based dependency parsing
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Biographies or blenders: which resource is best for cross-domain sentiment analysis?
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain (Sekine, 1997; Gildea, 2001). Hence, an important task is to select appropriate domains. However, most previous work on domain adaptation relied on the implicit assumption that domains are somehow given. As more and more data becomes available, automatic ways to select data that is beneficial for a new (unknown) target domain are becoming attractive. This paper evaluates various ways to automatically acquire related training data for a given test set. The results show that an unsupervised technique based on topic models is effective -- it outperforms random data selection on both languages examined, English and Dutch. Moreover, the technique works better than manually assigned labels gathered from meta-data that is available for English.