Learner's Self-Assessment: A Case Study of SVM for Information Retrieval

  • Authors:
  • Adam Kowalczyk;Bhavani Raskutti

  • Affiliations:
  • -;-

  • Venue:
  • AI '01 Proceedings of the 14th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper demonstrates that the predictive capabilities of a typical kernel machine on the training set can be a reliable indicator of its performance on the independent test set in the region where scores are larger than 1 in magnitude. We present initial results of a number of experiments on the popular Reuters newswire benchmark and the NIST handwritten digit recognition data set. In particular, we demonstrate that the values of recall and precision estimated from the training and independent test sets are within a few percent of each other for the evaluated benchmarks. Interestingly, this holds for both separable and non-separable data cases, and for training sample sizes an order of magnitude smaller than the dimensionality of the feature space used (e.g. using 驴 2000 samples versus 驴 20000 features for Reuters data). A theoretical explanation of the observed phenomena is also presented.