Solving non-stationary bandit problems by random sampling from sibling Kalman filters

Authors:
Ole-Christoffer Granmo;Stian Berg
Affiliations:
Department of ICT, University of Agder, Grimstad, Norway;Department of ICT, University of Agder, Grimstad, Norway
Venue:
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Year:
2010

Citing 12
Cited 2

Learning automata: an introduction

Learning automata: an introduction
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Machine Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Learning in embedded systems

Learning in embedded systems
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Nearly optimal exploration-exploitation decision thresholds

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning
Learning Automata-Based Solutions to the Nonlinear Fractional Knapsack Problem With Applications to Optimal Resource Allocation

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A two-armed bandit based scheme for accelerated decentralized learning

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel solution scheme for bandit problems with non-stationary normally distributed rewards. The scheme is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling Kalman Filters, and on random sampling from these posteriors. Furthermore, it is able to track the better actions, thus supporting non-stationary bandit problems. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms, not only in non-stationary environments, but in stationary environments also. Furthermore, our scheme is robust to inexact parameter settings. We thus believe that our methodology opens avenues for obtaining improved novel solutions.