Optimal Control Using the Transport Equation: The Liouville Machine

Authors:
Ivo Kwee;Jürgen Schmidhuber
Affiliations:
-;IDSIA Switzerland
Venue:
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Year:
2002

Citing 10
Cited 0

Reinforcement learning in Markovian and non-Markovian environments

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Technical Note: \cal Q-Learning

Machine Learning
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Neural network dynamics for path planning and obstacle avoidance

Neural Networks
Navigation guided by artificial force fields

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

Transport theory describes the scattering behavior of physical particles such as photons. Here we show how to connect this theory to optimal control theory and to adaptive behavior of agents embedded in an environment. Environments and tasks are defined by physical boundary conditions. Given some task, we compute a set of probability densities on continuous state and action and time. From these densities we derive an optimal policy such that for all states the most likely action maximizes the probability of reaching a predefined goal state. Liouville's conservation theorem tells us that the conditional density at time t, state s, and action a must equal the density at t+ dt, s+ ds, a+ da. Discretization yields a linear system that can be solved directly and whose solution corresponds to an optimal policy. Discounted reward schemes are incorporated naturally by taking the Laplace transform of the equations. The Liouville machine quickly solves rather complex maze problems.