Exploitation vs. exploration: choosing a supplier in an environment of incomplete information

Authors:
Rina Azoulay-Schwartz;Sarit Kraus;Jonathan Wilkenfeld
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel and Institute for Advanced Computer Studies, University of Maryland, College Park, MD;Institute for Advanced Computer Studies, University of Maryland, College Park, MD and Department of Government and Politics University of Maryland, College Park, MD
Venue:
Decision Support Systems
Year:
2004

Citing 5
Cited 4

Planning and control

Planning and control
Characteristics of electronic markets

Decision Support Systems - Special issue on electronic commerce
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Adaptive load balancing: a study in multi-agent learning

Journal of Artificial Intelligence Research

Improving the Exploration Strategy in Bandit Algorithms

Learning and Intelligent Optimization
Adaptive ε-greedy exploration in reinforcement learning based on value differences

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
A dynamic programming strategy to balance exploration and exploitation in the bandit problem

Annals of Mathematics and Artificial Intelligence
Efficient bidding strategies for Cliff-Edge problems

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An agent operating in the real world must often choose between maximizing its expected utility according to its current knowledge about the world and trying to learn more about the world, since this may improve its future gains. This problem is known as the trade-off between exploitation and exploration. In this research, we consider this problem in the context of electronic commerce. An agent intends to buy a particular product (goods or service). There are several potential suppliers of this product, but they differ in their quality and in the price charged. The buyer cannot observe the average quality of each product, but he has some knowledge about the quality of previous goods purchased from the suppliers. On the one hand, the buyer is motivated to buy the goods from the supplier with the highest expected product quality, deducting the product price. However, when buying from a lesser known supplier, the buyer can learn about its quality and this can help him in the future, when he will purchase more products of this type. We show the similarity of the suppliers problem to the k-armed bandit problem, and we suggest solving the suppliers problem by evaluating Gittins indices and choosing the supplier with the optimal index. We demonstrate how Gittins indices are calculated in real world situations, where deals of different magnitudes may exist, and where product prices may vary. Finally, we consider the existence of suppliers with no history and suggest how the original Gittins indices can be adapted in order to consider this extension.