An effective method for correlated selection problems

Authors:
Jonathan Gratch
Affiliations:
-
Venue:
An effective method for correlated selection problems
Year:
1994

Citing 0
Cited 1

The Racing Algorithm: Model Selection for Lazy Learners

Artificial Intelligence Review - Special issue on lazy learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many problems in machine learning are actually statistical problems of a class called correlated selection problems. Selection problems arise when one must select the best hypothesis (such as hypothesized concept description, or a hypothesized problem-solving heuristic) from a set, given its performance over some training data. In a correlated selection problem, the hypotheses are positively correlated, meaning they perform similarly on the same data. Correlated selection problems generally arise in machine learning whenever elements of the hypothesis space share some common structure. For example, in speed-up learning a learning algorithm must repeatedly selecting one of a set of small variations to an existing search control strategy [Gratch92, Greiner92]. In inductive learning there are two issues which are naturally cast as correlated selection problems: the attribute selection problem consists of selecting one of a set of attributes to add to an existing concept description [Fayyad91, Musick93]; the feature selection problem consists of selecting one of a set of feature vectors to learn from where there is considerable overlap between the vectors [Moore94]. In all of these problems the hypotheses have considerable common structure and therefore their performance on data will tend to be highly positively correlated. This article discusses the standard statistical approaches to this problem and their limitations. It then introduces a new, efficient method for such problems and discusses its average and worst-case complexity. Finally, an even more efficient, but heuristic, approach is discussed that incorporates decision-theoretic analysis to minimize the cost of selecting a good hypothesis.