Introduction of fixed mode states into online profit sharing and its application to waist trajectory generation of biped robot

Authors:
Seiya Kuroda;Kazuteru Miyazaki;Hiroaki Kobayashi
Affiliations:
Panasonic Factory Solutions Co., Ltd., Japan;National Institution for Academic Degrees and University Evaluation, Tokyo, Japan;Meiji University, Kawasaki, Kanagawa, Japan
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 7
Cited 1

Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning to Drive a Bicycle Using Reinforcement Learning and Shaping

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Motivated reinforcement learning for adaptive characters in open-ended simulation games

Proceedings of the international conference on Advances in computer entertainment technology
Evolution Strategies for Direct Policy Search

Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence

Evaluation of the improved penalty avoiding rational policy making algorithm in real world environment

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning of long-term tasks, learning efficiency may deteriorate when an agent's probabilistic actions cause too many mistakes before task learning reaches its goal. The new type of state we propose --- fixed mode --- to which a normal state shifts if it has already received sufficient reward --- chooses an action based on a greedy strategy, eliminating randomness of action selection and increasing efficiency. We start by proposing the combining of an algorithm with penalty avoiding rational policy making and online profit sharing with fixed mode states. We then discuss the target system and learning-controller design. In simulation, the learning task involves stabilizing of biped walking by using the learning controller to modify a robot's waist trajectory. We then discuss simulation results and the effectiveness of our proposal.