Passive Sampling for Regression

  • Authors:
  • Hwanjo Yu;Sungchul Kim

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Active sampling (also called active learning or selective sampling) has been extensively researched for classification and rank learning methods, which is to select the most informative samples from unlabeled data such that, once the samples are labeled, the accuracy of the function learned from the samples is maximized. While active sampling methods require learning a function at each iteration to find the most informative samples, this paper proposes passive sampling techniques for regression, which find the informative samples not based on the learned function but based on the samples' geometric characteristics in the feature space. Passive sampling is more efficient than active sampling, as it does not require, at each iteration, learning and validating the regression functions and evaluating the unlabeled data using the function. For regression, passive sampling is also more effective, Active sampling for regression suffers from serious performance fluctuations in practice, because it selects the samples of highest regression errors and such samples are likely noisy. Passive sampling, on the other hand, shows more stable performance. We observe from our extensive experiments that our passive sampling methods perform even better than the ``omniscient'' active sampling that knows the labels of unlabeled data.