On learning multicategory classification with sample queries

  • Authors:
  • Joel Ratsaby

  • Affiliations:
  • Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK

  • Venue:
  • Information and Computation
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Consider the pattern recognition problem of learning multicategory classification from a labeled sample, for instance, the problem of learning character recognition where a category corresponds to an alphanumeric letter. The classical theory of pattern recognition assumes labeled examples appear according to the unknown underlying pattern-class conditional probability distributions where the pattern classes are picked randomly according to their a priori probabilities. In this paper we pose the following question: Can the learning accuracy be improved if labeled examples are independently randomly drawn according to the underlying class conditional probability distributions but the pattern classes are chosen not necessarily according to their a priori probabilities? We answer this in the affirmative by showing that there exists a tuning of the sub-sample proportions which minimizes a loss criterion. The tuning is relative to the intrinsic complexity of the Bayes-classifier. As this complexity depends on the underlying probability distributions which are assumed to be unknown, we provide an algorithm which learns the proportions in an on-line manner utilizing sample querying which asymptotically minimizes the criterion. In practice, this algorithm may be used to boost the performance of existing learning classification algorithms by apportioning better sub-sample proportions.