Practical Algorithms for On-line Sampling

  • Authors:
  • Carlos Domingo;Ricard Gavaldà;Osamu Watanabe

  • Affiliations:
  • -;-;-

  • Venue:
  • DS '98 Proceedings of the First International Conference on Discovery Science
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the core applications of machine learning to knowledge discovery is building a hypothesis (such as a decision tree or neural network) from a given amount of data, so that we can later use it to predict new instances of the data. In this paper, we focus on a particular situation where we assume that the hypothesis we want to use for prediction is a very simple one so the hypotheses class is of feasible size. We study the problem of how to determine which of the hypotheses in the class is almost the best one. We present two on-line sampling algorithms for selecting a hypothesis, give theoretical bounds on the number of examples needed, and analyze them experimentally. We compare them with the simple batch sampling approach commonly used and show that in most of the situations our algorithms use a much smlaler number of examples.