Automatic shaping and decomposition of reward functions

Authors:
Bhaskara Marthi
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 15
Cited 10

An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Scaling Reinforcement Learning toward RoboCup Soccer

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Reinforcement Learning and Shaping: Encouraging Intended Behaviors

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Distributed Value Functions

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
A Distributed Reinforcement Learning Scheme for Network Routing

A Distributed Reinforcement Learning Scheme for Network Routing
Autonomous shaping: knowledge transfer in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
QUICR-learning for multi-agent coordination

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Potential-based shaping and Q-value initialization are equivalent

Journal of Artificial Intelligence Research
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Hierarchical solution of Markov decision processes using macro-actions

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Multigrid Reinforcement Learning with Reward Shaping

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Multi-task evolutionary shaping without pre-specified representations

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Dynamic reward shaping: training a robot by voice

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Theoretical considerations of potential-based reward shaping for multi-agent systems

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Multi-agent, reward shaping for RoboCup KeepAway

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Dynamic potential-based reward shaping

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Reinforcement Learning with Reward Shaping and Mixed Resolution Function Approximation

International Journal of Agent Technologies and Systems
Learning potential functions and their representations for multi-task reinforcement learning

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a method that learns a shaped reward function given a set of state and temporal abstractions. Next, we consider decomposition of the per-timestep reward in multieffector problems, in which the overall agent can be decomposed into multiple units that are concurrently carrying out various tasks. We show by example that to find a good reward decomposition, it is often necessary to first shape the rewards appropriately. We then give a function approximation algorithm for solving both problems together. Standard reinforcement learning algorithms can be augmented with our methods, and we show experimentally that in each case, significantly faster learning results.