Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Authors:
Yin-Hao Wang;Tzuu-Hseng S. Li;Chih-Jui Lin
Affiliations:
-;-;-
Venue:
Engineering Applications of Artificial Intelligence
Year:
2013

Citing 27
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Stable Function Approximation in Dynamic Programming

Stable Function Approximation in Dynamic Programming
Reinforcement learning based on local state feature learning and policy adjustment

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Introduction to multimedia and mobile agents
Accelerating autonomous learning by using heuristic selection of actions

Journal of Heuristics
A machine-learning approach to multi-robot coordination

Engineering Applications of Artificial Intelligence
Passive dynamic walker controller design employing an RLS-based natural actor-critic learning algorithm

Engineering Applications of Artificial Intelligence
Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition

International Journal of Computer Vision
Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach

Engineering Applications of Artificial Intelligence
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Exploration and exploitation balance management in fuzzy reinforcement learning

Fuzzy Sets and Systems
Design of Mamdani fuzzy logic controllers with rule base minimisation using genetic algorithm

Engineering Applications of Artificial Intelligence
Bridging the gap between feature- and grid-based SLAM

Robotics and Autonomous Systems
A new mobile robot navigation method using fuzzy logic and a modified Q-learning algorithm

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Adaptation technique for integrating genetic programming and reinforcement learning for real robots

IEEE Transactions on Evolutionary Computation
Rapid, safe, and incremental learning of navigation strategies

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A new Q-learning algorithm based on the metropolis criterion

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Ensemble Algorithms in Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Quantum Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Cooperative strategy based on adaptive Q-learning for robot soccer systems

IEEE Transactions on Fuzzy Systems
Combination of online clustering and Q-value based GA for reinforcement fuzzy system design

IEEE Transactions on Fuzzy Systems
A Markov Game-Adaptive Fuzzy Controller for Robot Manipulators

IEEE Transactions on Fuzzy Systems
Forecasting of short-term traffic-flow based on improved neurofuzzy models via emotional temporal difference learning algorithm

Engineering Applications of Artificial Intelligence
Walking Motion Generation, Synthesis, and Control for Biped Robot by Using PGRL, LPI, and Fuzzy Logic

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Incremental State Aggregation for Value Function Estimation in Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Artificial neural network simulation of hourly groundwater levels in a coastal aquifer system of the Venice lagoon

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.