Complexity analysis of real-time reinforcement learning

Authors:
Sven Koenig;Reid G. Simmons
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Venue:
AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Year:
1993

Citing 13
Cited 2

The complexity of Markov decision processes

Mathematics of Operations Research
Real-time heuristic search

Artificial Intelligence
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Incremental path planning on graphs with cycles

Proceedings of the first international conference on Artificial intelligence planning systems
Efficient learning and planning within the Dyna framework

Proceedings of the second international conference on From animals to animats 2 : simulation of adaptive behavior: simulation of adaptive behavior
Learning continuous-space navigation heuristics in real time

Proceedings of the second international conference on From animals to animats 2 : simulation of adaptive behavior: simulation of adaptive behavior
Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Complexity Analysis of Real-Time Reinforcement Learning Applied to Finding Shortest Paths in Deterministic Domains

Complexity Analysis of Real-Time Reinforcement Learning Applied to Finding Shortest Paths in Deterministic Domains
Optimal Probabilistic and Decision-Theoretic Planning using Markovian

Optimal Probabilistic and Decision-Theoretic Planning using Markovian
Learning and Sequential Decision Making

Learning and Sequential Decision Making
Learning in embedded systems

Learning in embedded systems
Random walks, universal traversal sequences, and the complexity of maze problems

SFCS '79 Proceedings of the 20th Annual Symposium on Foundations of Computer Science
Exploring an unknown graph

SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science

Dynamic policy programming

The Journal of Machine Learning Research
Automatic skill acquisition in reinforcement learning using graph centrality measures

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm was augmented. We show that, to the contrary, the algorithms are tractable with only a simple change in the task representation or initialization. We provide tight bounds on the worst-case complexity, and show how the complexity is even smaller if the reinforcement learning algorithms have initial knowledge of the topology of the state space or the domain has certain special properties. We also present a novel bidirectional Q-learning algorithm to find optimal paths from all states to a goal state and show that it is no more complex than the other algorithms.