Temporal difference learning and simulated annealing for optimal control: a case study

Authors:
Jinsong Leng;Beulah M. Sathyaraj;Lakhmi Jain
Affiliations:
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA, Australia;School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA, Australia;School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, SA, Australia
Venue:
KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications
Year:
2008

Citing 10
Cited 1

Simulated annealing

Modern heuristic techniques for combinatorial problems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
On the Relationship between Learning Capability and the Boltzmann-Formula

Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Simulation and reinforcement learning with soccer agents

Multiagent and Grid Systems - Innovations in intelligent agent technology
Heuristic search based exploration in reinforcement learning

IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
Reinforcement learning of competitive skills with soccer agents

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Convergence analysis on approximate reinforcement learning

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
A new Q-learning algorithm based on the metropolis criterion

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Research Directions in the KES Centre

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The trade-off between exploration and exploitation has an important impact on the performance of temporal difference learning. There are several action selection strategies, however, it is unclear which strategy is better. The impact of action selection strategies may depend on the application domains and human factors. This paper presents a modified Sarsa(λ) control algorithm by sampling actions in conjunction with simulated annealing technique. A game of soccer is utilised as the simulation environment, which has a large, dynamic and continuous state space. The empirical results demonstrate that the quality of convergence has been significantly improved by using the simulated annealing approach.