Policy Iteration for Learning an Exercise Policy for American Options

Authors:
Yuxi Li;Dale Schuurmans
Affiliations:
Department of Computing Science, University of Alberta,;Department of Computing Science, University of Alberta,
Venue:
Recent Advances in Reinforcement Learning
Year:
2008

Citing 10
Cited 0

Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Least-squares policy iteration

The Journal of Machine Learning Research
50th ANNIVERSARY ARTICLE: Option Pricing: Valuation Models and Applications

Management Science
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Regression methods for pricing complex American-style options

IEEE Transactions on Neural Networks
Learning to trade via direct reinforcement

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Options are important financial instruments, whose prices are usually determined by computational methods. Computational finance is a compelling application area for reinforcement learning research, where hard sequential decision making problems abound and have great practical significance. In this paper, we investigate reinforcement learning methods, in particular, least squares policy iteration (LSPI), for the problem of learning an exercise policy for American options. We also investigate a method by Tsitsiklis and Van Roy, referred to as FQI. We compare LSPI and FQI with LSM, the standard least squares Monte Carlo method from the finance community. We evaluate their performance on both real and synthetic data. The results show that the exercise policies discovered by LSPI and FQI gain larger payoffs than those discovered by LSM, on both real and synthetic data. Our work shows that solution methods developed in reinforcement learning can advance the state of the art in an important and challenging application area, and demonstrates furthermore that computational finance remains an under-explored area for deployment of reinforcement learning methods.