Reinforcement Learning with Factored States and Actions

Authors:
Brian Sallans;Geoffrey E. Hinton
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2004

Citing 32
Cited 9

The computational complexity of probabilistic inference using Bayesian belief networks (research note)

Artificial Intelligence
A model for reasoning about persistence and causation

Computational Intelligence
Introduction to the theory of neural computation

Introduction to the theory of neural computation
Information processing in dynamical systems: foundations of harmony theory

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning and relearning in Boltzmann machines

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Connectionist learning of belief networks

Artificial Intelligence
Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
An Introduction to Variational Methods for Graphical Models

Machine Learning
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Stochastic dynamic programming with factored representations

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Training products of experts by minimizing contrastive divergence

Neural Computation
Theoretical Results on Reinforcement Learning with Temporally Abstract Options

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning to Cooperate via Policy Search

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
UCP-Networks: A Directed Graphical Representation of Conditional Utilities

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Vector-space Analysis of Belief-state Approximation for POMDPs

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Dynamic Programming

Dynamic Programming
Temporal credit assignment in reinforcement learning

Temporal credit assignment in reinforcement learning
Variational methods for inference and estimation in graphical models

Variational methods for inference and estimation in graphical models
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Reasoning with conditional ceteris paribus preference statements

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Approximate planning for factored POMDPs using belief state simplification

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

ICML '06 Proceedings of the 23rd international conference on Machine learning
Robust Population Coding in Free-Energy-Based Reinforcement Learning

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Binary action search for learning continuous-action control policies

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning Representation and Control in Markov Decision Processes: New Frontiers

Foundations and Trends® in Machine Learning
Value-function-based transfer for reinforcement learning using structure mapping

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Dependency parsing with energy-based reinforcement learning

IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation

The Journal of Machine Learning Research
Free-energy based reinforcement learning for vision-based navigation with high-dimensional sensory inputs

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Solving hybrid markov decision processes

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel approximation method is presented for approximating the value function and selecting good actions for Markov decision processes with large state and action spaces. The method approximates state-action values as negative free energies in an undirected graphical model called a product of experts. The model parameters can be learned efficiently because values and derivatives can be efficiently computed for a product of experts. Actions can be found even in large factored action spaces by the use of Markov chain Monte Carlo sampling. Simulation results show that the product of experts approximation can be used to solve large problems. In one simulation it is used to find actions in action spaces of size 240.