On Regression-Based Stopping Times

Authors:
Benjamin Roy
Affiliations:
Stanford University, Stanford, USA
Venue:
Discrete Event Dynamic Systems
Year:
2010

Citing 4
Cited 3

A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Regression methods for pricing complex American-style options

IEEE Transactions on Neural Networks

Error Bounds for Approximations from Projected Linear Equations

Mathematics of Operations Research
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Pathwise Optimization for Optimal Stopping Problems

Management Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study approaches that fit a linear combination of basis functions to the continuation value function of an optimal stopping problem and then employ a greedy policy based on the resulting approximation. We argue that computing weights to maximize expected payoff of the greedy policy or to minimize expected squared-error with respect to an invariant measure is intractable. On the other hand, certain versions of approximate value iteration lead to policies competitive with those that would result from optimizing the latter objective.