The theory of parsing, translation, and compiling
The theory of parsing, translation, and compiling
IEEE Transactions on Pattern Analysis and Machine Intelligence
CLAWS4: the tagging of the British National Corpus
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Memory-Based Language Processing (Studies in Natural Language Processing)
Memory-Based Language Processing (Studies in Natural Language Processing)
Measuring language divergence by intra-lexical comparison
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Detecting change in data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Domain adaptation for statistical classifiers
Journal of Artificial Intelligence Research
Effective measures of domain similarity for parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Biographies or blenders: which resource is best for cross-domain sentiment analysis?
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Cross-genre and cross-domain detection of semantic uncertainty
Computational Linguistics
Hi-index | 0.00 |
Many natural language processing (NLP) tools exhibit a decrease in performance when they are applied to data that is linguistically different from the corpus used during development. This makes it hard to develop NLP tools for domains for which annotated corpora are not available. This paper explores a number of metrics that attempt to predict the cross-domain performance of an NLP tool through statistical inference. We apply different similarity metrics to compare different domains and investigate the correlation between similarity and accuracy loss of NLP tool. We find that the correlation between the performance of the tool and the similarity metric is linear and that the latter can therefore be used to predict the performance of an NLP tool on out-of-domain data. The approach also provides a way to quantify the difference between domains.