Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm

Authors:
Carlos H. C. Ribeiro;Renê Pegoraro;Anna H. Reali Costa
Affiliations:
Instituto Tecnológico de Aeronáutica, São José, Brazil;Universidade Estadual Paulista, Bauru, Brazil;Escola Politécnica da Universidade de São Paulo, São Paulo, Brazil
Venue:
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Year:
2002

Citing 4
Cited 1

Temporal difference learning and TD-Gammon

Communications of the ACM
Embedding a Priori Knowledge in Reinforcement Learning

Journal of Intelligent and Robotic Systems
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Fast concurrent reinforcement learners

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Heuristic selection of actions in multiagent reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the use of experience generalization on concurrent and on-line policy learning in multi-agent scenarios, using reinforcement learning algorithms. Agents learning concurrently implies in a non-stationary scenario, since the reward received by one agent (for applying an action in a state) depends on the behavior of the other agents. Non-stationary scenarios can be viewed as a two-player game in which an agent and the other player (which represents the other agents and the environment) select actions from the available actions in the current state; these actions define the possible next state. An RL algorithm that can be applied to such a scenario is the Minimax-Q algorithm, which is known to guarantee convergence to equilibrium in the limit. However, finding optimal control policies using any RL algorithm Minimax-Q included) can be very time consuming. We investigate the use of experience generalization for increasing the rate of convergence of RL algorithms, and contribute a new learning algorithm, Minimax-QS, which incorporates experience generalization to the Minimax-Q algorithm. We also prove its convergence to Minimax-Q values under suitable conditions.