Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
Automatic programming of behavior-based robots using reinforcement learning
Artificial Intelligence
Practical Issues in Temporal Difference Learning
Machine Learning
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
A two-dimensional interpolation function for irregularly-spaced data
ACM '68 Proceedings of the 1968 23rd ACM national conference
Experience generalization for concurrent reinforcement learners: the minimax-QS algorithm
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
Artificial Intelligence Review
A Design of Reward Function Based on Knowledge in Multi-agent Learning
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
A Human-Robot Collaborative Reinforcement Learning Algorithm
Journal of Intelligent and Robotic Systems
Knowledge of opposite actions for reinforcement learning
Applied Soft Computing
Hi-index | 0.00 |
In the last years, temporal differences methods have been put forward asconvenient tools for reinforcement learning. Techniques based on temporaldifferences, however, suffer from a serious drawback: as stochastic adaptivealgorithms, they may need extensive exploration of the state-action spacebefore convergence is achieved. Although the basic methods are nowreasonably well understood, it is precisely the structural simplicity of thereinforcement learning principle – learning through experimentation– that causes these excessive demands on the learning agent.Additionally, one must consider that the agent is very rarely a tabula rasa:some rough knowledge about characteristics of the surrounding environment isoften available. In this paper, I present methods for embedding a prioriknowledge in a reinforcement learning technique in such a way that both themathematical structure of the basic learning algorithm and the capacity togeneralise experience across the state-action space are kept. Extensiveexperimental results show that the resulting variants may lead to goodperformance, provided a sensible balance between risky use of priorimprecise knowledge and cautious use of learning experience is adopted.