Natural gradient works efficiently in learning
Neural Computation
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Sequential Optimality and Coordination in Multiagent Systems
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Learning to Cooperate via Policy Search
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Complexity of Decentralized Control of Markov Decision Processes
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Coordinated Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Exponential families for conditional random fields
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
ECML'05 Proceedings of the 16th European conference on Machine Learning
Neurocomputing
Solving multiagent assignment Markov decision processes
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Cooperative multi-robot reinforcement learning: a framework in hybrid state space
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Monte-Carlo expectation maximization for decentralized POMDPs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally been trained with using a set of observation and label pairs. Underlying all CRFs is the assumption that, conditioned on the training data, the labels are independent and identically distributed (iid). In this paper we explore the use of CRFs in a class of temporal learning algorithms, namely policy-gradient reinforcement learning (RL). Now the labels are no longer iid. They are actions that update the environment and affect the next observation. From an RL point of view, CRFs provide a natural way to model joint actions in a decentralized Markov decision process. They define how agents can communicate with each other to choose the optimal joint action. Our experiments include a synthetic network alignment problem, a distributed sensor network, and road traffic control; clearly outperforming RL methods which do not model the proper joint policy.