Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Machine Learning
Learning Policies with External Memory
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Temporal Difference Model Reproduces Anticipatory Neural Activity
Neural Computation
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Hi-index | 0.00 |
In this paper we represent a preliminary research on designing a behavior-based adaptive system utilizing self-optimizing memory controller. Rather than holistic search for the whole memory contents the model adopt associated feature analysis to successively memorize a newly experience state-action pair as an action of past experience, produce motor commands that make the controlled system to behave desirably in the future. Actor-Critic learning is used to adaptively tuning the control parameters, while an on-line variant of random forests (RF) learner is used to approximate the policy of Actor and the value function of Critic. Learning capability of the proposed model is experimentally examined through a task of Cart-Pole balancing problem, designed in mind as computation with perception. The result shows that the robot with self-optimizing memory acquired behaviors such as balancing the pole, displays planning based on past experiences.