Classifier Adaptation with Non-representative Training Data

  • Authors:
  • Sriharsha Veeramachaneni;George Nagy

  • Affiliations:
  • -;-

  • Venue:
  • DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an adaptive methodology to tune the decision boundaries of a classifier trained on non-representative data to the statistics of the test data to improve accuracy. Specifically, for machine printed and handprinted digit recognition we demonstrate that adapting the class means alone can provide considerable gains in recognition. On machine-printed digits we adapt to the typeface, on hand-print to the writer. We recognize the digits with a Gaussian quadratic classifier when the style of the test set is represented by a subset of the training set, and also when it is not represented in the training set. We compare unsupervised adaptation and style-constrained classification on isogenous test sets of five machine-printed and two hand-printed NIST data sets. Both estimating mean and imposing style constraints reduce the error-rate in almost every case, and neither ever results in significant loss. They are comparable under the first scenario (specialization), but adaptation is better under the second (new style). Adaptation is beneficial when the test is large enough (even if only ten samples of each class by one writer in a 100- dimensional feature space), but style conscious classification is the only option with fields of only two or three digits.