Probabilistic exploration in planning while learning

Authors:
Grigoris I. Karakoulas
Affiliations:
Interactive Information Group, Institute for Information Technology, National Research Council Canada, Ottawa, Ontario, Canada
Venue:
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Year:
1995

Citing 13
Cited 2

Conductance and the rapid mixing property for Markov chains: the approximation of permanent resolved

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Planning and control

Planning and control
Practical Issues in Temporal Difference Learning

Machine Learning
Learning in embedded systems

Learning in embedded systems
Improving learning performance through rational resource allocation

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Inferring Finite Automata with Stochastic Output Functions and an Application to Map Learning

Machine Learning
Robot Learning

Robot Learning
Experience Selection and Problem Choice in an Exploratory Learning System

Machine Learning
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Monte-Carlo algorithms for enumeration and reliability problems

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
COMPOSER: a probabilistic solution to the utility problem in speed-up learning

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
A statistical approach to solving the EBL utility problem

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies

Autonomous Agents and Multi-Agent Systems
Utility-based on-line exploration for repeated navigation in an embedded graph

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probabiiity, arbitrarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy performs better than a typical exploration strategy.