Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Sequential decision problems and neural networks
Advances in neural information processing systems 2
Technical Note: \cal Q-Learning
Machine Learning
Algorithms for random generation and counting: a Markov chain approach
Algorithms for random generation and counting: a Markov chain approach
Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Learning curve bounds for a Markov decision process with undiscounted rewards
COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Learning policies for partially observable environments: scaling up
Readings in agents
Analytical Mean Squared Error Curves for Temporal DifferenceLearning
Machine Learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Expected Mistake Bound Model for On-Line Reinforcement Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Bias and variance in value function estimation
ICML '04 Proceedings of the twenty-first international conference on Machine learning
P3VI: a partitioned, prioritized, parallel value iterator
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Exploration and apprenticeship learning in reinforcement learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
PAC model-free reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
A hierarchical approach to efficient reinforcement learning in deterministic domains
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Combining expert advice in reactive environments
Journal of the ACM (JACM)
Mathematics of Operations Research
The Journal of Machine Learning Research
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces
Discrete Event Dynamic Systems
Foundations and Trends in Databases
Proceedings of the 25th international conference on Machine learning
Knows what it knows: a framework for self-aware learning
Proceedings of the 25th international conference on Machine learning
The many faces of optimism: a unifying approach
Proceedings of the 25th international conference on Machine learning
Expediting RL by using graphical structures
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Value Function Based Reinforcement Learning in Changing Markovian Environments
The Journal of Machine Learning Research
An analysis of model-based Interval Estimation for Markov Decision Processes
Journal of Computer and System Sciences
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case
Recent Advances in Reinforcement Learning
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online Markov Decision Processes
Mathematics of Operations Research
Efficient reinforcement learning with relocatable action models
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Efficient structure learning in factored-state MDPs
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
An experts algorithm for transfer learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Customized learning algorithms for episodic tasks withacyclic state spaces
CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Censored exploration and the dark pool problem
Communications of the ACM
Provably Efficient Learning with Typed Parametric Models
The Journal of Machine Learning Research
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
A Bayesian sampling approach to exploration in reinforcement learning
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Censored exploration and the Dark Pool Problem
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Online regret bounds for Markov decision processes with deterministic transitions
Theoretical Computer Science
Autonomous Agents and Multi-Agent Systems
PAC-MDP learning with knowledge-based admissible models
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Game theory for cyber security
Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Exploration in relational worlds
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression
Annals of Mathematics and Artificial Intelligence
Non-deterministic policies in Markovian decision processes
Journal of Artificial Intelligence Research
Exploiting Best-Match Equations for Efficient Reinforcement Learning
The Journal of Machine Learning Research
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Hierarchical Knowledge Gradient for Sequential Sampling
The Journal of Machine Learning Research
Reinforcement learning and apprenticeship learning for robotic control
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Book reviews: Self-learning control of finite Markov chains
Automatica (Journal of IFAC)
Planning under partial observability by classical replanning: theory and experiments
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Replanning in domains with partial information and sensing actions
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Handling ambiguous effects in action learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
V-MAX: tempered optimism for better PAC reinforcement learning
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Integrating a partial model into model free reinforcement learning
The Journal of Machine Learning Research
Bayes-optimal reinforcement learning for discrete uncertainty domains
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Active learning of relational action models
ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes
Proceedings of the Winter Simulation Conference
Replanning in domains with partial information and sensing actions
Journal of Artificial Intelligence Research
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
On Potential Cognitive Abilities in the Machine Kingdom
Minds and Machines
Exploration in relational domains for model-based reinforcement learning
The Journal of Machine Learning Research
Reinforcement learning in robotics: A survey
International Journal of Robotics Research
Monte-Carlo tree search for Bayesian reinforcement learning
Applied Intelligence
Hi-index | 0.02 |
We present new algorithms for reinforcement learning and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. After observing that the number of actions required to approach the optimal return is lower bounded by the mixing time T of the optimal policy (in the undiscounted case) or by the horizon time T (in the discounted case), we then give algorithms requiring a number of actions and total computation time that are only polynomial in T and the number of states and actions, for both the undiscounted and discounted cases. An interesting aspect of our algorithms is their explicit handling of the Exploration-Exploitation trade-off.