The Bayesian pursuit algorithm: a new family of estimator learning automata

Authors:
Xuan Zhang;Ole-Christoffer Granmo;B. John Oommen
Affiliations:
Dept. of ICT, University of Agder, Grimstad, Norway;Dept. of ICT, University of Agder, Grimstad, Norway;School of Computer Science, Carleton University, Ottawa, Canada and Dept. of ICT, University of Agder, Grimstad, Norway
Venue:
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II
Year:
2011

Citing 7
Cited 2

Learning automata: an introduction

Learning automata: an introduction
Machine Learning

Machine Learning
Networks of Learning Automata: Techniques for Online Stochastic Optimization

Networks of Learning Automata: Techniques for Online Stochastic Optimization
Stochastic Traffic Engineering in Multihop Cognitive Wireless Mesh Networks

IEEE Transactions on Mobile Computing
SELARP: Scalable and Energy-Aware Learning Automata-Based Routing Protocols for Wireless Sensor Networks

SENSORCOMM '10 Proceedings of the 2010 Fourth International Conference on Sensor Technologies and Applications
A Learning Automaton Based Approach for Data Fragments Allocation in Distributed Database Systems

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Continuous and discretized pursuit learning schemes: variousalgorithms and their comparison

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Discretized bayesian pursuit --- a new scheme for reinforcement learning

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The fastest Learning Automata (LA) algorithms currently available come from the family of estimator algorithms. The Pursuit algorithm (PST), a pioneering scheme in the estimator family, obtains its superior learning speed by using Maximum Likelihood (ML) estimates to pursue the action currently perceived as being optimal. Recently, a Bayesian LA (BLA) was introduced, and empirical results that demonstrated its advantages over established top performers, including the PST scheme, were reported. The BLA scheme is inherently Bayesian in nature, but it succeeds in avoiding the computational intractability by merely relying on updating the hyper-parameters of sibling conjugate priors, and on random sampling from the resulting posteriors. In this paper, we integrate the foundational learning principles motivating the design of the BLA, with the principles of the PST. By doing this, we have succeeded in obtaining a completely novel, and rather pioneering, approach to solving LA-like problems, namely, by designing the Bayesian Pursuit algorithm (BPST). As in the BLA, the estimates are truly Bayesian (as opposed to ML) in nature. However, the action selection probability vector of the PST is used for its exploration purposes. Also, unlike the ML estimate, which is usually a single value, the use of a posterior distribution permits us to choose any one of a spectrum of values in the posterior, as the appropriate estimate. Thus, in this paper, we have chosen a 95% percentile value of the posterior (instead of the mean) to pursue the most promising actions. Further, as advocated in [7], the pursuit has been done using both the Linear Reward-Penalty and Reward-Inaction philosophies, leading to the corresponding BPSTRP and BPSTRI schemes respectively. It turns out that the BPST is superior to the PST, with the BPSTRI being even more robust than the BPSTRP. Moreover, by controlling the learning speed of the BPST, the BPST schemes perform either better or comparable to the BLA. We thus believe that the BPST constitutes a new avenue of research, in which the performance benefits of the PST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.