Efficient reinforcement learning

Authors:
Claude-Nicolas Fiechter
Affiliations:
Univ. of Pittsburgh, Pittsburgh, PA
Venue:
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Year:
1994

Citing 10
Cited 26

A theory of the learnable

Communications of the ACM
The complexity of Markov decision processes

Mathematics of Operations Research
Quantifying inductive bias: AI learning algorithms and Valiant's learning framework

Artificial Intelligence
Introduction

Machine Learning - Special issue on genetic algorithms
Learning to Perceive and Act by Trial and Error

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
An approach to anytime learning

ML92 Proceedings of the ninth international workshop on Machine learning
Temporal difference learning of backgammon strategy

ML92 Proceedings of the ninth international workshop on Machine learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Inductive Inference, DFAs, and Computational Complexity

AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference

Markov decision processes in large state spaces

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Learning curve bounds for a Markov decision process with undiscounted rewards

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
A competitive approach to game learning

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
PAC adaptive control of linear systems

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Learning to Take Actions

Machine Learning
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Polynomial-time reinforcement learning of near-optimal policies

Eighteenth national conference on Artificial intelligence
Efficient learning of multi-step best response

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Discrete Event Dynamic Systems
A linear-complexity reparameterisation strategy for the hierarchical bootstrapping of capabilities within perception-action architectures

Image and Vision Computing
Efficient reinforcement learning with relocatable action models

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Customized learning algorithms for episodic tasks withacyclic state spaces

CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Near-optimal Regret Bounds for Reinforcement Learning

The Journal of Machine Learning Research
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
Learning to take actions

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Multiagent reinforcement learning algorithm using temporal difference error

ISNN'05 Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I
On the efficient implementation biologic reinforcement learning using eligibility traces

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
A reinforcement learning algorithm using temporal difference error in ant model

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
A cooperation online reinforcement learning approach in ant-q

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I
Efficient ant reinforcement learning using replacing eligibility traces

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Book reviews: Self-learning control of finite Markov chains

Automatica (Journal of IFAC)
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework.In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is provided with a “reset” operation that interrupts the current sequence of experiments and starts a new one (from the initial state).We do not require the agent to learn the optimal policy but only a good approximation of it with high probability. More precisely, we require the learner to produce a policy whose expected value from the initial state is &egr;-close to that of the optimal policy, with probability no less than 1−&dgr;.For this model, we describe an algorithm that produces such an (&egr;,&dgr;)-optimal policy for any environment, in time polynomial in N,K,1/&egr;,1/&dgr;,1/(1−&bgr;) and rmax, where N is the number of states of the environment, K is the maximum number of actions in a state, &bgr; is the discount factor and rmax is the maximum reward on any transition.