A nonlinear reinforcement scheme for stochastic learning automata

Authors:
Florin Stoica;Emil M. Popa
Affiliations:
Computer Science Department, University "Lucian Blaga" Sibiu, Sibiu, Romania;Computer Science Department, University "Lucian Blaga" Sibiu, Sibiu, Romania
Venue:
MMACTEE'06 Proceedings of the 8th WSEAS international conference on Mathematical methods and computational techniques in electrical engineering
Year:
2006

Citing 8
Cited 0

Learning automata: an introduction

Learning automata: an introduction
Simulation study of multiple intelligent vehicle control using stochastic learning automata

Transactions of the Society for Computer Simulation International - Special issue: simulation methodology in transportation systems
Incremental reinforcement learning for designing multi-agent systems

Proceedings of the fifth international conference on Autonomous agents
New Topics in Learning Automata Theory and Applications

New Topics in Learning Automata Theory and Applications
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Reinforcement learning estimation of distribution algorithm

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Multiple stochastic learning automata for vehicle path control in an automated highway system

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

A stochastic automaton can perform a finite number of actions in a random environment. When a specific action is performed, the environment responds by producing an environment output that is stochastically related to the action. This response may be favorable or unfavorable. The aim is to design an automaton that can determine the best action guided by past actions and responses. The reinforcement scheme presented is shown to satisfy all necessary and sufficient conditions for absolute expediency for a stationary environment. An automaton using this scheme is guaranteed to "do better" at every time step than at the previous step (expected value of the average penalty at one iteration step is less than of the previous step for all steps). Some simulation results are presented, which prove that our algorithm converges to a solution faster than the one given in [7].