RL-Based Memory Controller for Scalable Autonomous Systems

Authors:
Osman Hassab Elgawi
Affiliations:
School of Engineering, Tokyo Institute of Technology, Japan
Venue:
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Year:
2009

Citing 10
Cited 0

Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Memory Approaches to Reinforcement Learning in Non-Markovian Domains

Memory Approaches to Reinforcement Learning in Non-Markovian Domains
Robust non-linear control through neuroevolution

Robust non-linear control through neuroevolution
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper contributes on designing an autonomous system utilizing self-optimizing memory controller for non-Markovian reinforcement tasks. Instead of holistic search for the whole memory contents, the controller adopts associated feature analysis to produce the most likely relevant action from previous experiences. Actor-Critic (AC) learning is used to adaptively tuning the control parameters, while on-line variant of Random Forest (RF) learner is used as memory-capable to approximate the policy of Actor and the value function of Critic. Learning capability is experimentally examined through non-Markovian cart-pole balancing task. The result shows that the proposed controller acquired complex behaviors such as balancing two poles simultaneously and displays long-term planning.