Technical Note: \cal Q-Learning
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
Reinforcement learning based on local state feature learning and policy adjustment
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Introduction to multimedia and mobile agents
Accelerating autonomous learning by using heuristic selection of actions
Journal of Heuristics
A machine-learning approach to multi-robot coordination
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence
Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition
International Journal of Computer Vision
Dynamic scheduling of maintenance tasks in the petroleum industry: A reinforcement approach
Engineering Applications of Artificial Intelligence
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Exploration and exploitation balance management in fuzzy reinforcement learning
Fuzzy Sets and Systems
Design of Mamdani fuzzy logic controllers with rule base minimisation using genetic algorithm
Engineering Applications of Artificial Intelligence
Bridging the gap between feature- and grid-based SLAM
Robotics and Autonomous Systems
A new mobile robot navigation method using fuzzy logic and a modified Q-learning algorithm
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Adaptation technique for integrating genetic programming and reinforcement learning for real robots
IEEE Transactions on Evolutionary Computation
Rapid, safe, and incremental learning of navigation strategies
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A new Q-learning algorithm based on the metropolis criterion
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Ensemble Algorithms in Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Quantum Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Cooperative strategy based on adaptive Q-learning for robot soccer systems
IEEE Transactions on Fuzzy Systems
Combination of online clustering and Q-value based GA for reinforcement fuzzy system design
IEEE Transactions on Fuzzy Systems
A Markov Game-Adaptive Fuzzy Controller for Robot Manipulators
IEEE Transactions on Fuzzy Systems
Engineering Applications of Artificial Intelligence
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Incremental State Aggregation for Value Function Estimation in Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.