Planning and control
Characteristics of electronic markets
Decision Support Systems - Special issue on electronic commerce
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Adaptive load balancing: a study in multi-agent learning
Journal of Artificial Intelligence Research
Improving the Exploration Strategy in Bandit Algorithms
Learning and Intelligent Optimization
Adaptive ε-greedy exploration in reinforcement learning based on value differences
KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Annals of Mathematics and Artificial Intelligence
Efficient bidding strategies for Cliff-Edge problems
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
An agent operating in the real world must often choose between maximizing its expected utility according to its current knowledge about the world and trying to learn more about the world, since this may improve its future gains. This problem is known as the trade-off between exploitation and exploration. In this research, we consider this problem in the context of electronic commerce. An agent intends to buy a particular product (goods or service). There are several potential suppliers of this product, but they differ in their quality and in the price charged. The buyer cannot observe the average quality of each product, but he has some knowledge about the quality of previous goods purchased from the suppliers. On the one hand, the buyer is motivated to buy the goods from the supplier with the highest expected product quality, deducting the product price. However, when buying from a lesser known supplier, the buyer can learn about its quality and this can help him in the future, when he will purchase more products of this type. We show the similarity of the suppliers problem to the k-armed bandit problem, and we suggest solving the suppliers problem by evaluating Gittins indices and choosing the supplier with the optimal index. We demonstrate how Gittins indices are calculated in real world situations, where deals of different magnitudes may exist, and where product prices may vary. Finally, we consider the existence of suppliers with no history and suggest how the original Gittins indices can be adapted in order to consider this extension.