Customized learning algorithms for episodic tasks withacyclic state spaces

Authors:
Theologos Bountourelis;Spyros Reveliotis
Affiliations:
School of Industrial & Systems Engineering, Georgia Institute of Technology;School of Industrial & Systems Engineering, Georgia Institute of Technology
Venue:
CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Year:
2009

Citing 10
Cited 0

Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
An introduction to computational learning theory

An introduction to computational learning theory
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Expected Mistake Bound Model for On-Line Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Introduction to Probability Models, Ninth Edition

Introduction to Probability Models, Ninth Edition
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces

Discrete Event Dynamic Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The work presented in this paper provides a practical, customized learning algorithm for reinforcement learning tasks that evolve episodically over acyclic state spaces. The presented results are motivated by the Optimal Disassembly Planning (ODP) problem described in [14], and they complement and enhance some earlier developments on this problem that were presented in [15]. In particular, the proposed algorithm is shown to be a substantial improvement of the original algorithm developed in [15], in terms of, both, the involved computational effort and the attained performance, where the latter is measured by the accumulated reward. The new algorithm also leads to a robust performance gain over the typical Q-learning implementations for the considered problem context.