Using domain similarity for performance estimation

  • Authors:
  • Vincent Van Asch;Walter Daelemans

  • Affiliations:
  • University of Antwerp, Antwerp, Belgium;University of Antwerp, Antwerp, Belgium

  • Venue:
  • DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many natural language processing (NLP) tools exhibit a decrease in performance when they are applied to data that is linguistically different from the corpus used during development. This makes it hard to develop NLP tools for domains for which annotated corpora are not available. This paper explores a number of metrics that attempt to predict the cross-domain performance of an NLP tool through statistical inference. We apply different similarity metrics to compare different domains and investigate the correlation between similarity and accuracy loss of NLP tool. We find that the correlation between the performance of the tool and the similarity metric is linear and that the latter can therefore be used to predict the performance of an NLP tool on out-of-domain data. The approach also provides a way to quantify the difference between domains.