Reinforcement Learning in Complex Environments Through Multiple Adaptive Partitions

Authors:
Andrea Bonarini;Alessandro Lazaric;Marcello Restelli
Affiliations:
Artificial Intelligence and Robotics Laboratory, Department of Electronics and Information, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;Artificial Intelligence and Robotics Laboratory, Department of Electronics and Information, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy;Artificial Intelligence and Robotics Laboratory, Department of Electronics and Information, Politecnico di Milano, Piazza Leonardo da Vinci 32, I-20133 Milan, Italy
Venue:
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Year:
2007

Citing 5
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Variable Resolution Discretization in Optimal Control

Machine Learning
Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The application of Reinforcement Learning (RL) algorithms to learn tasks for robots is often limited by the large dimension of the state space, which may make prohibitive its application on a tabular model. In this paper, we describe LEAP (Learning Entities Adaptive Partitioning), a model-free learning algorithm that uses overlapping partitions which are dynamically modified to learn near-optimal policies with a small number of parameters. Starting from a coarse aggregation of the state space, LEAP generates refined partitions whenever it detects an incoherencebetween the current action values and the actual rewards from the environment. Since in highly stochastic problems the adaptive process can lead to over-refinement, we introduce a mechanism that prunesthe macrostates without affecting the learned policy. Through refinement and pruning, LEAP builds a multi-resolution state representation specialized only where it is actually needed. In the last section, we present some experimental evaluation on a grid world and a complex simulated robotic soccer task.