A Simulation-based Approach for Solving Generalized Semi-Markov Decision Processes

Authors:
Emmanuel Rachelson;Gauthier Quesnel;Frédérick Garcia;Patrick Fabiani
Affiliations:
ONERA, France, email: emmanuel.rachelson@onera.fr;-;-;-
Venue:
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Year:
2008

Citing 10
Cited 0

GMSim: a tool for compositional GSMP modeling

Proceedings of the 30th conference on Winter simulation
Theory of Modeling and Simulation

Theory of Modeling and Simulation
Kernel-Based Reinforcement Learning

Machine Learning
Dynamic Programming

Dynamic Programming
Least-squares policy iteration

The Journal of Machine Learning Research
Dynamic programming for structured continuous Markov decision problems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
VLE: a multimodeling and simulation environment

Proceedings of the 2007 Summer Computer Simulation Conference
Solving generalized semi-Markov decision processes using continuous phase-type distributions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time is a crucial variable in planning and often requires special attention since it introduces a specific structure along with additional complexity, especially in the case of decision under uncertainty. In this paper, after reviewing and comparing MDP frameworks designed to deal with temporal problems, we focus on Generalized Semi-Markov Decision Processes (GSMDP) with observable time. We highlight the inherent structure and complexity of these problems and present the differences with classical reinforcement learning problems. Finally, we introduce a new simulation-based reinforcement learning method for solving GSMDP, bringing together results from simulation-based policy iteration, regression techniques and simulation theory. We illustrate our approach on a subway network control example.