The score-distributional threshold optimization for adaptive binary classification tasks

  • Authors:
  • Avi Arampatzis;André van Hameran

  • Affiliations:
  • Univ. of Nijmegen, Nijmegen, The Netherlands;Univ. of Nijmegen, Nijmegen, The Netherlands

  • Venue:
  • Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The thresholding of document scores has proved critical for the effectiveness of classification tasks. We review the most important approaches to thresholding, and introduce thescore-distributional (S-D) threshold optimizationmethod. The method is based on score distributions and is capable of optimizing any effectiveness measure defined in terms of the traditional contingency table.As a byproduct, we provide a model forscore distributions, and demonstrate its high accuracy in describing empirical data. The estimation method can be performed incrementally, a highly desirable feature for adaptive environments. Our work in modeling score distributions is useful beyond threshold optimization problems. It directly applies to other retrieval environments that make use of score distributions,e.g., distributed retrieval, or topic detection and tracking.The most accurate version of S-D thresholding --- although incremental --- can be computationally heavy. Therefore, we also investigate more practical solutions. We suggest practical approximations and discuss adaptivity, threshold initialization, and incrementality issues. The practical version of S-D thresholding has been tested in the context of the TREC-9 Filtering Track and found to be very effective [2].