Using expectation-maximization for reinforcement learning
Neural Computation
Machine Learning - Special issue on inductive transfer
Efficient exploration for optimizing immediate reward
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Stochastic learning of strategic equilibria for auctions
Neural Computation
Ant colony optimization and stochastic gradient descent
Artificial Life
Learning to Perceive Objects for Autonomous Navigation
Autonomous Robots
Learning Complex Tasks Using a Stepwise Approach
Journal of Intelligent and Robotic Systems
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes
Discrete Event Dynamic Systems
Using Machine Learning Techniques in Real-World Mobile Robots
IEEE Expert: Intelligent Systems and Their Applications
Two timescale analysis of the Alopex algorithm for optimization
Neural Computation
Selection and Reinforcement Learning for Combinatorial Optimization
PPSN VI Proceedings of the 6th International Conference on Parallel Problem Solving from Nature
RoboCup 2000: Robot Soccer World Cup IV
Sequential Decision Making Based on Direct Search
Sequence Learning - Paradigms, Algorithms, and Applications
Bounds on Sample Size for Policy Evaluation in Markov Environments
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Nonlinear credit assignment for musical sequences
Second international workshop on Intelligent systems design and application
Reinforcement learning for POMDPs based on action values and stochastic optimization
Eighteenth national conference on Artificial intelligence
Policy gradient methods in multi-agent systems: pursuit problem
Design and application of hybrid intelligent systems
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
The Journal of Machine Learning Research
Behavior transfer for value-function-based reinforcement learning
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Supervised Learning Through Neuronal Response Modulation
Neural Computation
A Reinforcement Learning Approach to Online Clustering
Neural Computation
Universal parameter optimisation in games based on SPSA
Machine Learning
Fuzzy Policy Reinforcement Learning in Cooperative Multi-robot Systems
Journal of Intelligent and Robotic Systems
Gradient descent for symmetric and asymmetric multiagent reinforcement learning
Web Intelligence and Agent Systems
Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation
The Journal of Machine Learning Research
Policy Gradient in Continuous Time
The Journal of Machine Learning Research
Conditional random fields for multi-agent reinforcement learning
Proceedings of the 24th international conference on Machine learning
A study of mechanisms for improving robotic group performance
Artificial Intelligence
Future Generation Computer Systems
Non-parametric policy gradients: a unified treatment of propositional and relational domains
Proceedings of the 25th international conference on Machine learning
Reinforcement Learning in Fine Time Discretization
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Reinforcement Learning Reward Functions for Unsupervised Learning
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
An Extremely Simple Reinforcement Learning Rule for Neural Networks
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
A Reinforcement Learning Technique with an Adaptive Action Generator for a Multi-robot System
SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
Self-organized Reinforcement Learning Based on Policy Gradient in Nonstationary Environments
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Policy Gradients with Parameter-Based Exploration for Control
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
A Continuous Internal-State Controller for Partially Observable Markov Decision Processes
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
A Learning Automata Approach to Multi-agent Policy Gradient Learning
KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
A New Natural Policy Gradient by Stationary Distribution Metric
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
State-Dependent Exploration for Policy Gradient Methods
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Learning to Combine Motor Primitives Via Greedy Additive Regression
The Journal of Machine Learning Research
Proposal of Exploitation-Oriented Learning PS-r#
IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Basis Expansion in Natural Actor Critic Methods
Recent Advances in Reinforcement Learning
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A spiking neural network model of an actor-critic learning agent
Neural Computation
The factored policy-gradient planner
Artificial Intelligence
Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk
IEICE - Transactions on Information and Systems
Stability of learning dynamics in two-agent, imperfect-information games
Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
2009 Special Issue: Goal-directed control and its antipodes
Neural Networks
Learning when to stop thinking and do something!
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Immediate Reward Reinforcement Learning for Clustering and Topology Preserving Mappings
Similarity-Based Clustering
An Inductive Logic Programming Approach to Statistical Relational Learning
Proceedings of the 2005 conference on An Inductive Logic Programming Approach to Statistical Relational Learning
Direct Policy Search Reinforcement Learning for Robot Control
Proceedings of the 2005 conference on Artificial Intelligence Research and Development
Multi-Agent Least-Squares Policy Iteration
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Reinforcement learning for a CPG-driven biped robot
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Efficient Sample Reuse in EM-Based Policy Search
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Geometric variance reduction in Markov chains: application to value function and gradient estimation
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Cooperative information sharing to improve distributed learning in multi-agent systems
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation
Journal of Artificial Intelligence Research
Simultaneous adversarial multi-robot learning
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Natural actor-critic algorithms
Automatica (Journal of IFAC)
Exploiting multiple secondary reinforcers in policy gradient reinforcement learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Reinforcement learning versus model predictive control: a comparison on a power system problem
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
WCNC'09 Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference
A Model of Neuronal Specialization Using Hebbian Policy-Gradient with "Slow" Noise
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
A Swarm-Based Learning Method Inspired by Social Insects
ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
Learning motor primitives for robotics
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
R-POPTVR: a novel reinforcement-based POPTVR fuzzy neural network for pattern classification
IEEE Transactions on Neural Networks
2010 Special Issue: Parameter-exploring policy gradients
Neural Networks
Transfer Learning for Reinforcement Learning Domains: A Survey
The Journal of Machine Learning Research
Reinforcement learning estimation of distribution algorithm
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Emerging behaviors by learning joint coordination in articulated mobile robots
IWANN'07 Proceedings of the 9th international work conference on Artificial neural networks
ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Reinforcement learning for cooperative actions in a partially observable multi-agent system
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Stochastic weights reinforcement learning for exploratory data analysis
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Solving deep memory POMDPs with recurrent policy gradients
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Clustering with reinforcement learning
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Learning spike-based population codes by reward and population feedback
Neural Computation
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Impedance learning for robotic contact tasks using natural actor-critic algorithm
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Generalized learning automata for multi-agent reinforcement learning
AI Communications - European Workshop on Multi-Agent Systems (EUMAS) 2009
Node perturbation learning without noiseless baseline
Neural Networks
A Generalized Path Integral Control Approach to Reinforcement Learning
The Journal of Machine Learning Research
ACM Transactions on Speech and Language Processing (TSLP)
Reinforcement learning for joint radio resource management in LTE-UMTS scenarios
Computer Networks: The International Journal of Computer and Telecommunications Networking
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Improving Gaussian process value function approximation in policy gradient algorithms
ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Preference-based policy iteration: leveraging preference learning for reinforcement learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Lagrange dual decomposition for finite horizon Markov decision processes
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Policy gradient reinforcement learning with environmental dynamics and action-values in policies
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
The thing that we tried didn't work very well: deictic representation in reinforcement learning
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Reinforcement learning with partially known world dynamics
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
PEGASUS: a policy search method for large MDPs and POMDPs
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Learning to cooperate via policy search
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
The optimal reward baseline for gradient-based reinforcement learning
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Efficient gradient estimation for motor control learning
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Reinforcement learning for parameter estimation in statistical spoken dialogue systems
Computer Speech and Language
Analysis and improvement of policy gradient estimation
Neural Networks
Feature extraction for decision-theoretic planning in partially observable environments
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Self-organizing relays in LTE networks: queuing analysis and algorithms
Proceedings of the 7th International Conference on Network and Services Management
Dynamic cooperator selection in cognitive radio networks
Ad Hoc Networks
Learning to make predictions in partially observable environments without a generative model
Journal of Artificial Intelligence Research
Bayesian policy search with policy priors
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Reinforcement learning to adjust robot movements to new situations
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Integrating particle swarm optimization with reinforcement learning in noisy problems
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Learning high-level planning from text
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Adaptive exploration using stochastic neurons
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Biologically plausible multi-dimensional reinforcement learning in neural networks
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Gradient algorithms for exploration/exploitation trade-offs: global and local variants
ANNPR'12 Proceedings of the 5th INNS IAPR TC 3 GIRPR conference on Artificial Neural Networks in Pattern Recognition
Observer effect from stateful resources in agent sensing
Autonomous Agents and Multi-Agent Systems
Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning
Robotics and Autonomous Systems
Efficient sample reuse in policy gradients with parameter-based exploration
Neural Computation
Variable risk control via stochastic optimization
International Journal of Robotics Research
Reinforcement learning in robotics: A survey
International Journal of Robotics Research
Counterfactual reasoning and learning systems: the example of computational advertising
The Journal of Machine Learning Research
MineralMiner: An active sensing simulation environment
Multiagent and Grid Systems
Hi-index | 0.00 |
This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.