Learning While Optimizing an Unknown Fitness Surface

  • Authors:
  • Roberto Battiti;Mauro Brunato;Paolo Campigotto

  • Affiliations:
  • DISI - Dipartimento di Ingegneria e Scienza dell'Informazione, Università di Trento, Italy;DISI - Dipartimento di Ingegneria e Scienza dell'Informazione, Università di Trento, Italy;DISI - Dipartimento di Ingegneria e Scienza dell'Informazione, Università di Trento, Italy

  • Venue:
  • Learning and Intelligent Optimization
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is about Reinforcement Learning (RL) applied to online parameter tuning in Stochastic Local Search (SLS) methods. In particular a novel application of RL is considered in the Reactive Tabu Search (RTS) method, where the appropriate amount of diversification in prohibition-based (Tabu) local search is adapted in a fast online manner to the characteristics of a task and of the local configuration. We model the parameter-tuning policy as a Markov Decision Process where the states summarize relevant information about the recent history of the search, and we determine a near-optimal policy by using the Least Squares Policy Iteration (LSPI) method. Preliminary experiments on Maximum Satisfiability (MAX-SAT) instances show very promising results indicating that the learnt policy is competitive with previously proposed reactive strategies.