A new class of ε-optimal learning automata

Authors:
G. I. Papadimitriou;M. Sklira;A. S. Pomportsis
Affiliations:
Dept. of Informatics, Aristotle Univ., Thessaloniki, Greece;-;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2004

Citing 0
Cited 4

Online optimization of replacement policies using learning automata

International Journal of Systems Science
Threshold optimization for rate adaptation algorithms in IEEE 802.11 WLANs

IEEE Transactions on Wireless Communications
Learning behaviors of the hierarchical structure stochastic automata

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
A new class of ε-optimal learning automata

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new class of P-model absorbing learning automata is introduced. The proposed automata are based on the use of a stochastic estimator in order to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed stochastic estimator scheme, the estimates of the reward probabilities of actions are not strictly dependent on the environmental responses. The dependence between the stochastic estimates and the deterministic ones is more relaxed for actions that have been selected only a few times. In this way, actions that have been selected only a few times, have the opportunity to be estimated as "optimal," to increase their choice probability and consequently, to be selected. In this way, the estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be ε-optimal in every stationary random environment. Furthermore, extensive simulation results are presented that indicate that the proposed stochastic estimator scheme converges faster than the deterministic-estimator-based DPRI and DGPA schemes when operating in stationary P-model random environments.