Capturing a Qualitative Model of NetworkPerformance and Predicting Behavior
Journal of Network and Systems Management
Practical solution techniques for first-order MDPs
Artificial Intelligence
Learning to coordinate controllers-reinforcement learning on a control basis
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Bayesian real-time dynamic programming
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Q-learning with linear function approximation
COLT'07 Proceedings of the 20th annual conference on Learning theory
Symbolic bounded real-time dynamic programming
SBIA'10 Proceedings of the 20th Brazilian conference on Advances in artificial intelligence
Hi-index | 0.00 |
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We introduce an algorithm based on DP, which we call Real-Time DP (RTDP), by which an embedded system can improve its performance with experience. RTDP generalizes Korf''s Learning-Real-Time-A algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several different classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins'' Q-Learning algorithm. A secondary aim of this article is to provide a bridge between AI research on real-time planning and learning and relevant concepts and algorithms from control theory. This research was supported by grants to A.G. Barto from the National Science Foundation (ECS-8912623 and ECS-9214866) and the Air Force Office of Scientific Research, Bolling AFB (AFOSR-89-0526).