STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Learning and value function approximation in complex decision processes
Learning and value function approximation in complex decision processes
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
Regression methods for pricing complex American-style options
IEEE Transactions on Neural Networks
Error Bounds for Approximations from Projected Linear Equations
Mathematics of Operations Research
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Mathematics of Operations Research
Pathwise Optimization for Optimal Stopping Problems
Management Science
Hi-index | 0.00 |
We study approaches that fit a linear combination of basis functions to the continuation value function of an optimal stopping problem and then employ a greedy policy based on the resulting approximation. We argue that computing weights to maximize expected payoff of the greedy policy or to minimize expected squared-error with respect to an invariant measure is intractable. On the other hand, certain versions of approximate value iteration lead to policies competitive with those that would result from optimizing the latter objective.