Relating reinforcement learning performance to classification performance

Authors:
John Langford;Bianca Zadrozny
Affiliations:
TTI-Chicago, Chicago, IL;IBM T.J. Watson, Yorktown Heights, NY
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 9
Cited 11

Learning to Take Actions

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Approximately Optimal Approximate Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Sensitive error correcting output codes

COLT'05 Proceedings of the 18th annual conference on Learning Theory

Rollout sampling approximate policy iteration

Machine Learning
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Recent Advances in Reinforcement Learning
Search-based structured prediction

Machine Learning
Bounds for multistage stochastic programs using supervised learning strategies

SAGA'09 Proceedings of the 5th international conference on Stochastic algorithms: foundations and applications
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Reducing reinforcement learning to KWIK online regression

Annals of Mathematics and Artificial Intelligence
On-line classification of data streams with missing values based on reinforcement learning

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Reinforcement learning and apprenticeship learning for robotic control

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
The K best-paths approach to approximate dynamic programming with application to portfolio optimization

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Besting the quiz master: crowdsourcing incremental classification games

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Reinforcement learning in robotics: A survey

International Journal of Robotics Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any unobservable assumptions (no assumption of independence, small mixing time, fully observable states, or even hidden states) and the resulting statement is independent of the number of states or actions. The statement is critically dependent on the size of the rewards and prediction performance of the created classifiers.We also provide some general guidelines for obtaining good classification performance on the created subproblems. In particular, we discuss possible methods for generating training examples for a classifier learning algorithm.