Decision theoretic generalizations of the PAC model for neural net and other learning applications
Information and Computation
Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers
Machine Learning - Special issue on COLT '93
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning and value function approximation in complex decision processes
Learning and value function approximation in complex decision processes
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Exploiting probabilistic knowledge under uncertain sensing for efficient robot behaviour
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Lagrangian relaxation for large-scale multi-agent planning
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Why long words take longer to read: the role of uncertainty about word length
CMCL '12 Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics
Adaptive reservoir computing through evolution and learning
Neurocomputing
Lagrangian Relaxation for Large-Scale Multi-agent Planning
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
A partially observable hybrid system model for bipedal locomotion for adapting to terrain variations
Proceedings of the 16th international conference on Hybrid systems: computation and control
Efficient sample reuse in policy gradients with parameter-based exploration
Neural Computation
Adaptive collective routing using gaussian process dynamic congestion models
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic model-based imitation learning
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy
Pervasive and Mobile Computing
Automatica (Journal of IFAC)
Hi-index | 0.00 |
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the general problem of policy search to one in which we need only consider POMDPs with deterministic transitions. We give a natural way of estimating the value of all policies in these transformed POMDPs. Policy search is then simply performed by searching for a policy with high estimated value. We also establish conditions under which our value estimates will be good, recovering theoretical results similar to those of Kearns, Mansour and Ng [7], but with "sample complexity" bounds that have only a polynomial rather than exponential dependence on the horizon time. Our method applies to arbitrary POMDPs, including ones with infinite state and action spaces. We also present empirical results for our approach on a small discrete problem, and on a complex continuous state/continuous action problem involving learning to ride a bicycle.