Sequential testing in classifier evaluation yields biased estimates of effectiveness

  • Authors:
  • William Webber;Mossaab Bagdouri;David D. Lewis;Douglas W. Oard

  • Affiliations:
  • University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA;David D. Lewis Consulting, Chicago, IL, USA;University of Maryland, College Park, MD, USA

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is common to develop and validate classifiers through a process of repeated testing, with nested training and/or test sets of increasing size. We demonstrate in this paper that such repeated testing leads to biased estimates of classifier effectiveness. Experiments on a range of text classification tasks under three sequential testing frameworks show all three lead to optimistic estimates of effectiveness. We calculate empirical adjustments to unbias estimates on our data set, and identify directions for research that could lead to general techniques for avoiding bias while reducing labeling costs.