Tug-of-war model for multi-armed bandit problem

Authors:
Song-Ju Kim;Masashi Aono;Masahiko Hara
Affiliations:
RIKEN-HYU Collaboration Research Center, Advanced Science Institute, RIKEN, Fusion Technology Center, Hanyang University, Seoul, Korea, Wako-shi, Saitama, Japan;RIKEN-HYU Collaboration Research Center, Advanced Science Institute, RIKEN, Fusion Technology Center, Hanyang University, Seoul, Korea, Wako-shi, Saitama, Japan;RIKEN-HYU Collaboration Research Center, Advanced Science Institute, RIKEN, Fusion Technology Center, Hanyang University, Seoul, Korea, Wako-shi, Saitama, Japan
Venue:
UC'10 Proceedings of the 9th international conference on Unconventional computation
Year:
2010

Citing 6
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Amoeba-based neurocomputing with chaotic dynamics

Communications of the ACM - ACM's plan to go online first
Resource-Competing Oscillator Network as a Model of Amoeba-Based Neurocomputer

UC '09 Proceedings of the 8th International Conference on Unconventional Computation
Tug-Of-War Model for Two-Bandit Problem

UC '09 Proceedings of the 8th International Conference on Unconventional Computation
Multi-armed bandit algorithms and empirical evaluation

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.