On the difficulty of modular reinforcement learning for real-world partial programming

Authors:
Sooraj Bhat;Charles L. Isbell;Michael Mateas
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, Georgia;College of Computing, Georgia Institute of Technology, Atlanta, Georgia;College of Computing, Georgia Institute of Technology, Atlanta, Georgia
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 5
Cited 5

Impediments to universal preference-based default theories

Artificial Intelligence - Special issue on knowledge representation
Learning to solve multiple goals

Learning to solve multiple goals
A Behavior Language for Story-Based Believable Agents

IEEE Intelligent Systems
The MAXQ Method for Hierarchical Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Multiple-goal reinforcement learning with modular Sarsa(O)

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

An interactive game-design assistant

Proceedings of the 13th international conference on Intelligent user interfaces
Towards adaptive programming: integrating reinforcement learning into a programming language

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Faster program adaptation through reward attribution inference

Proceedings of the 11th International Conference on Generative Programming and Component Engineering
A distributed Q-learning approach for variable attention to multiple critics

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
A novel modular Q-learning architecture to improve performance under incomplete learning in a grid soccer game

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years there has been a great deal of interest in "modular reinforcement learning" (MRL). Typically, problems are decomposed into concurrent subgoals, allowing increased scalability and state abstraction. An arbitrator combines the subagents' preferences to select an action. In this work, we contrast treating an MRL agent as a set of subagents with the same goal with treating an MRL agent as a set of subagents who may have different, possibly conflicting goals. We argue that the latter is a more realistic description of real-world problems, especially when building partial programs. We address a range of algorithms for single-goal MRL, and leveraging social choice theory, we present an impossibility result for applications of such algorithms to multigoal MRL. We suggest an alternative formulation of arbitration as scheduling that avoids the assumptions of comparability of preference that are implicit in single-goal MRL. A notable feature of this formulation is the explicit codification of the tradeoffs between the subproblems. Finally, we introduce A2BL, a language that encapsulates many of these ideas.