Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

Authors:
H. S. Al-Dayaa;D. B. Megherbi
Affiliations:
University of Massachusetts, Lowell, USA;University of Massachusetts, Lowell, USA
Venue:
The Journal of Supercomputing
Year:
2012

Citing 22
Cited 0

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Lookahead planning and latent learning in a classifier system

Proceedings of the first international conference on simulation of adaptive behavior on From animals to animats
Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Creating advice-taking reinforcement learners

Machine Learning - Special issue on reinforcement learning
Machine Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Design of Feedback Control Systems

Design of Feedback Control Systems
A Mathematical Introduction to Robotic Manipulation

A Mathematical Introduction to Robotic Manipulation
Maple Computer Manual for Advanced Engineering Mathematics

Maple Computer Manual for Advanced Engineering Mathematics
CMOS: mixed-signal circuit design

CMOS: mixed-signal circuit design
Complete Digital Design: A Comprehensive Guide to Digital Electronics and Computer System Architecture

Complete Digital Design: A Comprehensive Guide to Digital Electronics and Computer System Architecture
Computer Organization and Design

Computer Organization and Design
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
A Generalization Error for Q-Learning

The Journal of Machine Learning Research
Real-valued Q-learning in multi-agent cooperation

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control

IEEE Transactions on Intelligent Transportation Systems
A new Q-learning algorithm based on the metropolis criterion

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Study on Expertise of Agents and Its Effects on Cooperative -Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Ensemble Algorithms in Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Improved Adaptive–Reinforcement Learning Control for Morphing Unmanned Air Vehicles

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Quad-Q-learning

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) techniques have contributed and continue to tremendously contribute to the advancement of machine learning and its many related recent applications. As it is well known, some of the main limitations of existing RL techniques are, in general, their slow convergence and their computational complexity. The contributions of this paper are two-fold: (1) First, it introduces a technique for reinforcement learning using multiple lookahead levels that grants an autonomous agent more visibility in its environment and helps it learn faster. This technique extends the Watkins's Q-Learning algorithm by using the Multiple-Lookahead-Levels (MLL) model equation that we develop and present here. An analysis of the convergence of the MLL equation and proof of its effectiveness are performed. A method to compute the improvement rate of the agent's learning speed between different look-ahead levels is also proposed and implemented. Here, both the time and space complexities are examined. Results show that the number of steps, required to achieve the goal, per learning path exponentially decreases with the learning path number (time). Results also show that the number of steps per learning path, to some degree, is less at any time when the number of look-ahead levels is higher (space). Furthermore, we perform the analysis of the MLL system in the time domain and prove its temporal stability using Lyapunov theory. (2) Second, based on this Lyapunov stability analysis, we subsequently, and for the first time, propose a circuit architecture for the MLL technique's software configurable hardware system design for real-time applications.