SD-Q: selective discount Q learning based on new results of intertemporal choice theory

  • Authors:
  • Fengfei Zhao;Zheng Qin

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua University, Beijing, China;Department of Computer Science and Technology, Tsinghua University, Beijing, China

  • Venue:
  • AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss the reinforcement learning from an intertemporal choice perspective. Different from previous research, this paper wants to emphasize the importance of deeper understanding the psychological mechanism of human decision-making. In what follows we aim to improve the previous Q learning algorithm according to the new results of intertemporal choice experiments. We start with a brief introduction to new findings of intertemporal choice theory and reinforcement learning. Then we propose a new reinforcement learning algorithm with selective discount (SD-Q). Experiments show that, SD-Q is superior to both the traditional Q learning algorithm and the reinforcement learning method without considering the discount.