Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Authors:
Yan Cheng
Affiliations:
Fudan University, Information Management and Information System Department, Management School, Shanghai 200433, PR China
Venue:
Expert Systems with Applications: An International Journal
Year:
2009

Citing 9
Cited 1

Automatic programming of behavior-based robots using reinforcement learning

Artificial Intelligence
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
E-Commerce Recommendation Applications

Data Mining and Knowledge Discovery
Continuous-Action Q-Learning

Machine Learning
Design and evaluation of a multi-agent collaborative Web mining system

Decision Support Systems - Web retrieval and mining
Applications of the self-organising map to reinforcement learning

Neural Networks - New developments in self-organizing maps
Dynamic Conversion Behavior at E-Commerce Sites

Management Science
Revenue Management Through Dynamic Cross Selling in E-Commerce Retailing

Operations Research
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation

Multi-goal Q-learning of cooperative teams

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

This paper investigates how intelligent an agent may utilize a Q-learning approach, a simulation-based stochastic technique, to make optimal dynamic packaging decision in e-retailing setting. When the practical application of dynamic packaging involves a large number of products, normal Q-learning approach would encounter two major problems due to excessively large state space. First, learning the Q-values in tabular form may be infeasible because of the excessive amount of memory needed to store the table. Second, rewards in the state space may be so sparse that with random exploration they will only be discovered extremely slowly. This paper first describes the state-dependent and event-driven nature of the dynamic packaging problem with a Markov decision process model, then proposes a states generalization approach based on distortion measure, and finally puts forward a heuristic based exploration/exploitation policy which is used to improve the convergence in Q-learning. We validate our approach in a simulated test.