DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

Authors:
Carlos Mariano;Eduardo F. Morales
Affiliations:
-;-
Venue:
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Year:
2001

Citing 9
Cited 3

A study of permutation crossover operators on the traveling salesman problem

Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
Technical Note: \cal Q-Learning

Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Implicit Imitation in Multiagent Reinforcement Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A New Approach for the Solution of Multiple Objective Optimization Problems Based on Reinforcement Learning

MICAI '00 Proceedings of the Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Sequential optimality and coordination in multiagent systems

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
The traveling salesman: computational solutions for TSP applications

The traveling salesman: computational solutions for TSP applications

A New Approach to Improve the Ant Colony System Performance: Learning Levels

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
Multiobjective water pinch analysis of the cuernavaca city water distribution network

EMO'05 Proceedings of the Third international conference on Evolutionary Multi-Criterion Optimization
Ant colony system with characterization-based heuristics for a bottled-products distribution logistics system

Journal of Computational and Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins' Q-learning formula. DQL has some similarities with Gambardella's Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL's guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several testbed problems under similar conditions.