Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes

Authors:
Shalabh Bhatnagar;Mohammed Shahid Abdulla
Affiliations:
Department of Computer Science and Automation IndianInstitute of Science Bangalore 560 012, India;General Motors India Science Lab Bangalore
Venue:
Simulation
Year:
2008

Citing 19
Cited 2

A one-measurement form of simultaneous perturbation stochastic approximation

Automatica (Journal of IFAC)
Actor-Critic--Type Learning Algorithms for Markov Decision Processes

SIAM Journal on Control and Optimization
Optimal structured feedback policies for ABR flow control using two-timescale SPSA

IEEE/ACM Transactions on Networking (TON)
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Finite-Time Regret Bounds for the Multiarmed Bandit Problem

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences

ACM Transactions on Modeling and Computer Simulation (TOMACS)
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Cisco Frame Relay Solutions Guide

Cisco Frame Relay Solutions Guide
A Hybrid Genetic/Optimization Algorithm for Finite-Horizon, Partially Observed Markov Decision Processes

INFORMS Journal on Computing
Adaptive multivariate three-timescale stochastic approximation algorithms for simulation based optimization

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Introduction to Probability Models, Ninth Edition

Introduction to Probability Models, Ninth Edition
Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes

Discrete Event Dynamic Systems
Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)

Dimensionality effects on the Markov property in shape memory alloy hysteretic environment

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Numerical analysis of continuous time Markov decision processes over finite horizons

Computers and Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Of the former two, one algorithm uses a linear parameterization for the policy, resulting in reduced memory complexity. Convergence analysis is briefly sketched and illustrative numerical experiments with the four algorithms are shown for a problem of flow control in communication networks.