Practical reinforcement learning in continuous domains

Authors:
Jeffrey Forbes;David Andre
Affiliations:
-;-
Venue:
Practical reinforcement learning in continuous domains
Year:
2000

Citing 0
Cited 3

Multiagent reactive plan application learning in dynamic environments

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
Multiagent reactive plan application learning in dynamic environments

Proceedings of the 15th WSEAS international conference on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real-world domains have continuous features and actions, whereas the majority of results in the reinforcement learning community are for finite Markov decision processes. Much of the work that addresses continuous domains either uses discretization or simple parametric function approximators. A drawback to some commonly-used parametric function approximation techniques, such as neural networks, is that parametric methods can "forget" and concentrate representational power on new examples. In this paper, we propose a practical architecture for model-based reinforcement learning in continuous state and action spaces that avoids the above difficulties by using an instance-based modeling technique. We present a method for learning and maintaining a value function estimate using instance-based learners, and show that our method compares favorably to other function approximation methods, such as neural networks. Furthermore, our reinforcement learning algorithm learns an explicit model of the environment simultaneously with a value function and policy. The use of a model is beneficial, first, because it allows the agent to make better use of its experiences through simulated planning steps. Second, the use of a model makes it straightforward to provide prior information to the system in the form of the structure of the environmental model. We extend a technique called generalized prioritized sweeping to the continuous case in order to focus the agent''s planning steps on those states where the current value is most likely to be incorrect. We illustrate our algorithm''s effectiveness with results on several control domains.