Reinforcement learning in Markovian and non-Markovian environments
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Technical Note: \cal Q-Learning
Machine Learning
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Neural network dynamics for path planning and obstacle avoidance
Neural Networks
Navigation guided by artificial force fields
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Hi-index | 0.01 |
Transport theory describes the scattering behavior of physical particles such as photons. Here we show how to connect this theory to optimal control theory and to adaptive behavior of agents embedded in an environment. Environments and tasks are defined by physical boundary conditions. Given some task, we compute a set of probability densities on continuous state and action and time. From these densities we derive an optimal policy such that for all states the most likely action maximizes the probability of reaching a predefined goal state. Liouville's conservation theorem tells us that the conditional density at time t, state s, and action a must equal the density at t+ dt, s+ ds, a+ da. Discretization yields a linear system that can be solved directly and whose solution corresponds to an optimal policy. Discounted reward schemes are incorporated naturally by taking the Laplace transform of the equations. The Liouville machine quickly solves rather complex maze problems.