On-line evaluation and prediction using linear functions
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Competitive solutions for online financial problems
ACM Computing Surveys (CSUR)
Reinforcement learning and mistake bounded algorithms
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Individual sequence prediction—upper bounds and application for complexity
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Proceedings of the fifth international conference on Autonomous agents
Static optimality and dynamic search-optimality in lists and trees
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Online learning in online auctions
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Discrete Prediction Games with Arbitrary Feedback and Loss
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Adaptive Strategies and Regret Minimization in Arbitrarily Varying Markov Environments
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Adapting to a reliable network path
Proceedings of the twenty-second annual symposium on Principles of distributed computing
The empirical Bayes envelope and regret minimization in competitive Markov decision processes
Mathematics of Operations Research
Using confidence bounds for exploitation-exploration trade-offs
The Journal of Machine Learning Research
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
The Journal of Machine Learning Research
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Competitive on-line paging strategies for mobile users under delay constraints
Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing
The Role of Reactivity in Multiagent Learning
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 2
Online learning in online auctions
Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Reinforcement learning for active model selection
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Hedged learning: regret-minimization with learning experts
ICML '05 Proceedings of the 22nd international conference on Machine learning
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
An adaptive algorithm for selecting profitable keywords for search-based advertising services
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
Learning algorithms for online principal-agent problems (and selling goods online)
ICML '06 Proceedings of the 23rd international conference on Machine learning
Stochastic Approximations and Differential Inclusions, Part II: Applications
Mathematics of Operations Research
Artificial Intelligence
The Journal of Machine Learning Research
An experts approach to strategy selection in multiagent meeting scheduling
Autonomous Agents and Multi-Agent Systems
Reactivity and Safe Learning in Multi-Agent Systems
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Generalized multiagent learning with performance bound
Autonomous Agents and Multi-Agent Systems
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments
Theoretical Computer Science
Efficient bandit algorithms for online multiclass prediction
Proceedings of the 25th international conference on Machine learning
Proceedings of the 25th international conference on Machine learning
QoS-LI: QoS loss inference in disadvantaged networks -- part II
Proceedings of the 11th communications and networking simulation symposium
Competitive collaborative learning
Journal of Computer and System Sciences
Approximation algorithms for restless bandit problems
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Effective short-term opponent exploitation in simplified poker
Machine Learning
To create neuro-controlled game opponent from UCT-created data
Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
The offset tree for learning with partial labels
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A distributed reinforcement learning approach to mission survivability in tactical MANETs
Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
Experiments with Adaptive Transfer Rate in Reinforcement Learning
Knowledge Acquisition: Approaches, Algorithms and Applications
Performance bounded reinforcement learning in strategic interactions
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Effective short-term opponent exploitation in simplified poker
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Dynamic non-Bayesian decision making
Journal of Artificial Intelligence Research
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Investigations of continual computation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Monte-Carlo exploration for deterministic planning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Automatic weight learning for multiple data sources when learning from demonstration
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Information and Computation
Playing monotone games to understand learning behaviors
Theoretical Computer Science
To create intelligent adaptive neuro-controller of game opponent from UCT-created data
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Computer Networks: The International Journal of Computer and Telecommunications Networking
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Algorithm selection as a bandit problem with unbounded losses
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
IEEE Transactions on Wireless Communications
Regret Bounds and Minimax Policies under Partial Monitoring
The Journal of Machine Learning Research
A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Annals of Mathematics and Artificial Intelligence
Upper confidence trees with short term partial information
EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
Learning the demand curve in posted-price digital goods auctions
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Hannan consistency in on-line learning in case of unbounded losses under partial monitoring
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Defensive universal learning with experts
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Continuous experts and the binning algorithm
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Learning to select negotiation strategies in multi-agent meeting scheduling
EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
Competitive collaborative learning
COLT'05 Proceedings of the 18th annual conference on Learning Theory
FPL analysis for adaptive bandits
SAGA'05 Proceedings of the Third international conference on StochasticAlgorithms: foundations and applications
Unifying convergence and no-regret in multiagent learning
LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Competitive strategy for on-line leasing of depreciable equipment
Mathematical and Computer Modelling: An International Journal
Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Strong mitigation: nesting search for good policies within search for good reward
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Adaptive negotiating agents in dynamic games: outperforming human behavior in diverse societies
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Online implicit agent modelling
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Learning in real-time in repeated games using experts
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Optimum Object Selection Made Easy
Wireless Personal Communications: An International Journal
Online learning for auction mechanism in bandit setting
Decision Support Systems
Hi-index | 0.00 |
In the multi-armed bandit problem, a gambler must decide which arm of K non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the expected per-round payoff of our algorithm approaches that of the best arm at the rate O(T/sup -1/3/), and we give an improved rate of convergence when the best arm has fairly low payoff. We also consider a setting in which the player has a team of "experts" advising him on which arm to play; here, we give a strategy that will guarantee expected payoff close to that of the best expert. Finally, we apply our result to the problem of learning to play an unknown repeated matrix game against an all-powerful adversary.