Factored temporal difference learning in the new ties environment

Authors:
Viktor Gyenes;Ákos Bontovics;András Lörincz
Affiliations:
Eötvös Loránd University, Department of Information Systems;Eötvös Loránd University, Department of Information Systems;Eötvös Loránd University, Department of Information Systems
Venue:
Acta Cybernetica
Year:
2008

Citing 14
Cited 1

Bucket elimination: a unifying framework for reasoning

Artificial Intelligence
Stochastic dynamic programming with factored representations

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The size of MDP factored policies

Eighteenth national conference on Artificial intelligence
Reinforcement learning for factored markov decision processes

Reinforcement learning for factored markov decision processes
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Factored value iteration converges

Acta Cybernetica
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Multiagent reinforcement learning model for the emergence of common property and transhumance in sub-saharan africa

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents

Quantified Score

Hi-index	0.01

Visualization

Abstract

Although reinforcement learning is a popular method for training an agent for decision making based on rewards, well studied tabular methods are not applicable for large, realistic problems. In this paper, we experiment with a factored version of temporal difference learning, which boils down to a linear function approximation scheme utilising natural features coming from the structure of the task. We conducted experiments in the New Ties environment, which is a novel platform for multi-agent simulations. We show that learning utilising a factored representation is effective even in large state spaces, furthermore it outperforms tabular methods even in smaller problems both in learning speed and stability, because of its generalisation capabilities.