Adaptive value function approximations in classifier systems

  • Authors:
  • Lashon B. Booker

  • Affiliations:
  • The MITRE Corporation, McLean, VA

  • Venue:
  • GECCO '05 Proceedings of the 7th annual workshop on Genetic and evolutionary computation
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Considerable attention has been paid to the issue of valuefunction approximation in the reinforcement learning literature[3]. One of the fundamental assumptions underlying algorithms forsolving reinforcement learning problems is that states andstate-action pairs have well-defined values that can beapproximated and used to help determine an optimal policy. Thequality of those approximations is a critical factor in determiningthe success of many algorithms in solving reinforcement learningproblems.In most classifier systems, the information about the valuefunction is stored and computed by individual rules. Each rulemaintains an independent estimate of the value of taking itsdesignated action in the states that match its condition. From thisstandpoint, each rule is treated as a separate functionapproximator. The quality of the approximations that can beachieved by simple estimates like this is not very good. Even whenthose estimates are pooled together to compute a more reliablecollective estimate, it is still questionable how good the overallapproximation will be. It is also not clear what the best way is toimprove the quality of those approximations.One approach to improving approximation quality is to increasethe computational abilities of individual rules so that they becomemore capable function approximators [4]. Another idea is to lookback to the original concepts underlying the classifier systemframework and seek to take advantage of the properties ofdistributed representations in classifier systems [2]. This paperfollows in the spirit of the latter approach, looking for ways totap the distributed representational power present in a collectionof rules to improve the quality of value functionapproximations.Previous work [1] introduced a new approach to value functionapproximation in classifier systems called hyperplanecoding. Hyperplane coding is a closely related variationof tile coding [3] in which classifier rule conditions fill therole of tiles, and there are few restrictions on the way those"tiles" are organised. The basic idea is to treat rules as featuresthat collectively specify a linear gradient-descent functionapproximator. The hypothesis behind this idea is that classifierrules can be more effective as function approximators if they worktogether to implement a distributed, coarse-coded representation ofthe value function.Experiments with hyperplane coding have shown that by carefullyusing the resources available in a random population ofclassifiers, continuous value functions can be approximated with ahigh degree of accuracy. This approach computes much betterapproximations than more conventional classifier system methods inwhich individual rules compute approximations independently. Theresults to date also demonstrate that hyperplane coding can achievelevels of performance comparable to those achieved by morewellknown approaches to function approximation such as tile coding.High quality value function approximations that provide both datarecovery and generalisation are a critically important component ofmost approaches to solving reinforcement learning problems. Becausehyperplane coding substantially improves the quality of theapproximations that can be computed by a classifier system usingrelatively small populations of classifiers, it may provide thefoundation for significant improvements in classifier systemperformance.One open question remaining about hyperplane coding is how thequality of the approximation is affected by the set of classifiersin the population. A random population of classifiers is sufficientto obtain good results. Would a more carefully chosen population doeven better? The obvious next step in this research is to use theapproximation resources available in a random population as astarting point for a more refined approach to approximation thatreallocates resources adaptively to gain greater precision in thoseregions of the input space where it is needed. This paper shows howto compute such an adaptive function approximation. The goal islearn a population of classifiers that reflects thestructure of the input space (Dean & Wellman,1991). This means more rules (ie. more tiles) should be used toapproximate regions which are sampled often and in which thefunction values vary a great deal. Fewer rules should be used inregions which are rarely sampled and in which the function isnearly constant. We discuss how to adaptively manage the space inthe population, as well as how to structure the search for tilesthat reduce the approximation error.