Stochastic decomposition: an algorithm for two-state linear programs with recourse
Mathematics of Operations Research
Probability
Technical Note: \cal Q-Learning
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Q-learning: a tutorial and extensions
MANNA '95 Proceedings of the first international conference on Mathematics of neural networks : models, algorithms and applications: models, algorithms and applications
Journal of Optimization Theory and Applications
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms
SIAM Journal on Control and Optimization
On the convergence of optimistic policy iteration
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Learning Algorithms for Separable Approximations of Discrete Stochastic Optimization Problems
Mathematics of Operations Research
Dynamic-Programming Approximations for Stochastic Time-Staged Integer Multicommodity-Flow Problems
INFORMS Journal on Computing
A Nonparametric Approach to Multiproduct Pricing
Operations Research
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Provably Near-Optimal Sampling-Based Policies for Stochastic Inventory Control Models
Mathematics of Operations Research
An algorithm for approximating piecewise linear concave functions from sample gradients
Operations Research Letters
Approximate dynamic programming: lessons from the field
Proceedings of the 40th Conference on Winter Simulation
Integrated Optimization of Procurement, Processing, and Trade of Commodities
Operations Research
SMART: A Stochastic Multiscale Model for the Analysis of Energy Resources, Technology, and Policy
INFORMS Journal on Computing
Hi-index | 0.00 |
We consider a multistage asset acquisition problem where assets are purchased now, at a price that varies randomly over time, to be used to satisfy a random demand at a particular point in time in the future. We provide a rare proof of convergence for an approximate dynamic programming algorithm using pure exploitation, where the states we visit depend on the decisions produced by solving the approximate problem. The resulting algorithm does not require knowing the probability distribution of prices or demands, nor does it require any assumptions about its functional form. The algorithm and its proof rely on the fact that the true value function is a family of piecewise linear concave functions.