Efficient exploration for optimizing immediate reward
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Reinforcement learning for fuzzy agents: application to a pighouse environment control
New learning paradigms in soft computing
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
Machine learning paradigms for utility-based data mining
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Bayesian sparse sampling for on-line reward optimization
ICML '05 Proceedings of the 22nd international conference on Machine learning
Experience-efficient learning in associative bandit problems
ICML '06 Proceedings of the 23rd international conference on Machine learning
Risk-Sensitive Learning via Minimization of Empirical Conditional Value-at-Risk
IEICE - Transactions on Information and Systems
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
Optimistic Bayesian sampling in contextual-bandit problems
The Journal of Machine Learning Research
Hi-index | 0.00 |
An agent that must learn to act in the world by trial and error faces the reinforcement learning problem, which is quite different from standard concept learning. Although good algorithms exist for this problem in the general case, they are often quite inefficient and do not exhibit generalization. One strategy is to find restricted classes of action policies that can be learned more efficiently. This paper pursues that strategy by developing algorithms that can efficiently learn action maps that are expressible in k-DNF. The algorithms are compared with existing methods in empirical trials and are shown to have very good performance.