A one-measurement form of simultaneous perturbation stochastic approximation
Automatica (Journal of IFAC)
Actor-Critic--Type Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
Optimal structured feedback policies for ABR flow control using two-timescale SPSA
IEEE/ACM Transactions on Networking (TON)
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Finite-Time Regret Bounds for the Multiarmed Bandit Problem
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
ACM Transactions on Modeling and Computer Simulation (TOMACS)
SIAM Journal on Control and Optimization
Cisco Frame Relay Solutions Guide
Cisco Frame Relay Solutions Guide
INFORMS Journal on Computing
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Introduction to Probability Models, Ninth Edition
Introduction to Probability Models, Ninth Edition
Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
Discrete Event Dynamic Systems
Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimization
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Brief paper: Average cost temporal-difference learning
Automatica (Journal of IFAC)
Dimensionality effects on the Markov property in shape memory alloy hysteretic environment
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Numerical analysis of continuous time Markov decision processes over finite horizons
Computers and Operations Research
Hi-index | 0.00 |
We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of these algorithms are developed for finite state and compact action spaces while the other two are for finite state and finite action spaces. Of the former two, one algorithm uses a linear parameterization for the policy, resulting in reduced memory complexity. Convergence analysis is briefly sketched and illustrative numerical experiments with the four algorithms are shown for a problem of flow control in communication networks.