From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning
Discrete Event Dynamic Systems
Learning Time Allocation Using Neural Networks
CG '00 Revised Papers from the Second International Conference on Computers and Games
CONVERGENCE OF SIMULATION-BASED POLICY ITERATION
Probability in the Engineering and Informational Sciences
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL
Probability in the Engineering and Informational Sciences
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
The Journal of Machine Learning Research
STEWARD: demo of spatio-textual extraction on the web aiding the retrieval of documents
dg.o '07 Proceedings of the 8th annual international conference on Digital government research: bridging disciplines & domains
Brief paper: New algorithms of the Q-learning type
Automatica (Journal of IFAC)
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation
ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
Geometric variance reduction in Markov chains: application to value function and gradient estimation
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Natural actor-critic algorithms
Automatica (Journal of IFAC)
A Convergent Online Single Time Scale Actor Critic Algorithm
The Journal of Machine Learning Research
Learning to use the spectrum in self-configuring heterogenous networks: a logit equilibrium approach
Proceedings of the 5th International ICST Conference on Performance Evaluation Methodologies and Tools
Book reviews: Self-learning control of finite Markov chains
Automatica (Journal of IFAC)
Actor-critic algorithms for hierarchical Markov decision processes
Automatica (Journal of IFAC)
Hi-index | 0.01 |
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated transitions are formulated and analyzed. These are variants of the well-known "actor-critic" (or "adaptive critic") algorithm in the artificial intelligence literature. Distributed asynchronous implementations are considered. The analysis involves two time scale stochastic approximations.