Computationally feasible bounds for partially observed Markov decision processes
Operations Research
A survey of algorithmic methods for partially observed Markov decision processes
Annals of Operations Research
Inference of finite automata using homing sequences
Information and Computation
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning to Cooperate via Policy Search
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Exact and approximate algorithms for partially observable markov decision processes
Exact and approximate algorithms for partially observable markov decision processes
Nonapproximability results for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Approximating optimal policies for partially observable stochastic domains
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
A heuristic variable grid solution method for POMDPs
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Approximate planning for factored POMDPs using belief state simplification
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Tractable inference for complex stochastic processes
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs
Proceedings of the 25th international conference on Machine learning
On the possibility of learning in reactive environments with arbitrary dependence
Theoretical Computer Science
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system
Connection Science - Language and Robots
Representing systems with hidden state
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Learning partially observable action schemas
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Learning partially observable deterministic action models
Journal of Artificial Intelligence Research
Greedy algorithms for sequential sensing decisions
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Universal reinforcement learning
IEEE Transactions on Information Theory
Asymptotic learnability of reinforcement problems with arbitrary dependence
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Optimistic agents are asymptotically optimal
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Hi-index | 0.06 |
We consider the most realistic reinforcement learning setting in which an agent starts in an unknown environment (the POMDP) and must follow one continuous and uninterrupted chain of experience with no access to "resets" or "offline" simulation. We provide algorithms for general connected POMDPs that obtain near optimal average reward. One algorithm we present has a convergence rate which depends exponentially on a certain horizon time of an optimal policy, but has no dependence on the number of (unobservable) states. The main building block of our algorithms is an implementation of an approximate reset strategy, which we show always exists in every POMDP. An interesting aspect of our algorithms is how they use this strategy when balancing exploration and exploitation.