A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments

Authors:
David Vengerov
Affiliations:
Sun Microsystems Laboratories, Menlo Park, CA
Venue:
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
Year:
2007

Citing 6
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multi-agent Q-learning and Regression Trees for Automated Pricing Decisions

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Dynamic Pricing in the Presence of Inventory Considerations: Research Overview, Current Practices, and Future Directions

Management Science
A taxonomy of Data Grids for distributed data sharing, management, and processing

ACM Computing Surveys (CSUR)
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

As more companies are beginning to adopt the e-business model, it becomes easier for buyers to compare prices at multiple sellers and choose the one that charges the best price for the same item or service. As a result, the demand for the goods of a particular seller is becoming more unstable, since other sellers are regularly offering discounts that attract large fractions of buyers. Therefore, it becomes more important for each seller to switch from static to dynamic pricing policies that take into account observable characteristics of the current demand and the state of the seller's resources. This paper presents a Reinforcement Learning algorithm that can tune parameters of a seller's dynamic pricing policy in a gradient direction (thus converging to the optimal parameter values that maximize the revenue obtained by the seller) even when the seller's environment is not fully observable. This algorithm is evaluated using a simulated Grid market environment, where customers choose a Grid Service Provider (GSP) to which they want to submit a computing job based on the posted price and expected delay information at each GSP.