Structured prediction with reinforcement learning

Authors:
Francis Maes;Ludovic Denoyer;Patrick Gallinari
Affiliations:
LIP6, University Pierre et Marie Curie (Paris 6), Paris, France;LIP6, University Pierre et Marie Curie (Paris 6), Paris, France;LIP6, University Pierre et Marie Curie (Paris 6), Paris, France
Venue:
Machine Learning
Year:
2009

Citing 18
Cited 2

A maximum entropy approach to natural language processing

Computational Linguistics
A comparison of approaches to on-line handwritten character recognition

A comparison of approaches to on-line handwritten character recognition
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Match the Schemas of Data Sources: A Multistrategy Approach

Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the Complexity of General Context-Free Language Parsing and Recognition (Extended Abstract)

Proceedings of the 6th Colloquium, on Automata, Languages and Programming
Support vector machine learning for interdependent and structured output spaces

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning as search optimization: approximate large margin methods for structured prediction

ICML '05 Proceedings of the 22nd international conference on Machine learning
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
The Wikipedia XML corpus

ACM SIGIR Forum
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents

ACM SIGIR Forum
Exponentiated gradient algorithms for log-linear structured prediction

Proceedings of the 24th international conference on Machine learning
Incremental Bayesian networks for structure prediction

Proceedings of the 24th international conference on Machine learning
Search-based structured prediction

Machine Learning
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
A probabilistic learning method for XML annotation of documents

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Guest editorial: special issue on structured prediction

Machine Learning
Learning with configurable operators and RL-based heuristics

NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns

Quantified Score

Hi-index	0.00

Visualization

Abstract

We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems.