Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Online Choice of Active Learning Algorithms
The Journal of Machine Learning Research
A Geometric Approach to Multi-Criterion Reinforcement Learning
The Journal of Machine Learning Research
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
The Journal of Machine Learning Research
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Online learning in online auctions
Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
An adaptive pursuit strategy for allocating operator probabilities
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Contextual recommender problems [extended abstract]
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Anytime algorithms for multi-armed bandit problems
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Online trading algorithms and robust option pricing
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Combining expert advice in reactive environments
Journal of the ACM (JACM)
Entropy-Driven online active learning for interactive calendar management
Proceedings of the 12th international conference on Intelligent user interfaces
Improved second-order bounds for prediction with expert advice
Machine Learning
Regret Minimization Under Partial Monitoring
Mathematics of Operations Research
Stochastic Approximations and Differential Inclusions, Part II: Applications
Mathematics of Operations Research
Perspectives on multiagent learning
Artificial Intelligence
Multi-agent learning for engineers
Artificial Intelligence
The Journal of Machine Learning Research
Online linear optimization and adaptive routing
Journal of Computer and System Sciences
Nonstochastic bandits: Countable decision set, unbounded costs and reactive environments
Theoretical Computer Science
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Learning diverse rankings with multi-armed bandits
Proceedings of the 25th international conference on Machine learning
ECML '07 Proceedings of the 18th European conference on Machine Learning
Following the Perturbed Leader to Gamble at Multi-armed Bandits
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Competitive collaborative learning
Journal of Computer and System Sciences
Markov Decision Processes with Arbitrary Reward Processes
Recent Advances in Reinforcement Learning
Better algorithms for benign bandits
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Piecewise-stationary bandit problems with side observations
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
As Safe As It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Brief announcement: distributed algorithms for approximating wireless network capacity
Proceedings of the 28th ACM symposium on Principles of distributed computing
Optimal unbiased estimators for evaluating agent performance
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Markov Decision Processes with Arbitrary Reward Processes
Mathematics of Operations Research
The max K-armed bandit: a new model of exploration applied to search heuristic selection
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Restart schedules for ensembles of problem instances
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
An experts algorithm for transfer learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Two-sided bandits and the dating market
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Online learning in Markov decision processes with arbitrarily changing rewards and transitions
GameNets'09 Proceedings of the First ICST international conference on Game Theory for Networks
Regret Minimization and Job Scheduling
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Learning Permutations with Exponential Weights
The Journal of Machine Learning Research
Learning permutations with exponential weights
COLT'07 Proceedings of the 20th annual conference on Learning theory
Reinforcement learning-based load shared sequential routing
NETWORKING'07 Proceedings of the 6th international IFIP-TC6 conference on Ad Hoc and sensor networks, wireless networks, next generation internet
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Online distributed sensor selection
Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks
Truthful mechanisms with implicit payment computation
Proceedings of the 11th ACM conference on Electronic commerce
Online regret bounds for Markov decision processes with deterministic transitions
Theoretical Computer Science
Pure exploration in multi-armed bandits problems
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Optimizing genetic operator rates using a markov chain model of genetic algorithms
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Distributed algorithms for approximating wireless network capacity
INFOCOM'10 Proceedings of the 29th conference on Information communications
From optimization to regret minimization and back again
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Enhancing cognitive radio dynamic spectrum sensing through adaptive learning
MILCOM'09 Proceedings of the 28th IEEE conference on Military communications
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Approximation algorithms for restless bandit problems
Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Online learning in adversarial Lipschitz environments
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Toward a classification of finite partial-monitoring games
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Algorithms for adversarial bandit problems with multiple plays
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Online multiple kernel learning: algorithms and mistake bounds
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Algorithm selection as a bandit problem with unbounded losses
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
On learning algorithms for nash equilibria
SAGT'10 Proceedings of the Third international conference on Algorithmic game theory
No regret learning in oligopolies: cournot vs. bertrand
SAGT'10 Proceedings of the Third international conference on Algorithmic game theory
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
Regret Bounds and Minimax Policies under Partial Monitoring
The Journal of Machine Learning Research
Pure exploration in finitely-armed and continuous-armed bandits
Theoretical Computer Science
Reliable end-user communication under a changing packet network protocol
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Multiple tree for partially observable Monte-Carlo tree search
EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part I
Multiagent learning in large anonymous games
Journal of Artificial Intelligence Research
Better Algorithms for Benign Bandits
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms
The Journal of Machine Learning Research
Jamming-resistant communication in multi-channel multi-hop multi-path wireless networks
WASA'11 Proceedings of the 6th international conference on Wireless algorithms, systems, and applications
ShareBoost: boosting for multi-view learning with performance guarantees
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Computational randomness from generalized hardcore sets
FCT'11 Proceedings of the 18th international conference on Fundamentals of computation theory
Monte-carlo style UCT search for boolean satisfiability
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Lipschitz bandits without the Lipschitz constant
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
On upper-confidence bound policies for switching bandit problems
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Algorithm portfolio selection as a bandit problem with unbounded losses
Annals of Mathematics and Artificial Intelligence
Efficient Learning with Partially Observed Attributes
The Journal of Machine Learning Research
Multi-armed bandits with episode context
Annals of Mathematics and Artificial Intelligence
A simple distribution-free approach to the max k-armed bandit problem
CP'06 Proceedings of the 12th international conference on Principles and Practice of Constraint Programming
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Defensive universal learning with experts
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
On following the perturbed leader in the bandit setting
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Online learning with variable stage duration
COLT'06 Proceedings of the 19th annual conference on Learning Theory
The shortest path problem under partial monitoring
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
Improved second-order bounds for prediction with expert advice
COLT'05 Proceedings of the 18th annual conference on Learning Theory
From external to internal regret
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Understanding and protecting privacy: formal semantics and principled audit mechanisms
ICISS'11 Proceedings of the 7th international conference on Information Systems Security
FPL analysis for adaptive bandits
SAGA'05 Proceedings of the Third international conference on StochasticAlgorithms: foundations and applications
Online Learning and Online Convex Optimization
Foundations and Trends® in Machine Learning
Autonomous Agents and Multi-Agent Systems
Dynamic pricing with limited supply
Proceedings of the 13th ACM Conference on Electronic Commerce
Proceedings of the 13th ACM Conference on Electronic Commerce
Journal of Computer and System Sciences
Learning with stochastic inputs and adversarial outputs
Journal of Computer and System Sciences
The K-armed dueling bandits problem
Journal of Computer and System Sciences
Ants easily solve stochastic shortest path problems
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Learning where to attend with deep architectures for image tracking
Neural Computation
Quantitative Analysis of Systems Using Game-Theoretic Learning
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Siblingrivalry: online autotuning through local competitions
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Regret bounds for restless markov bandits
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
New algorithms for budgeted learning
Machine Learning
Online Multiple Kernel Classification
Machine Learning
Toward a classification of finite partial-monitoring games
Theoretical Computer Science
Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising
Operations Research
Adaptive crowdsourcing for temporal crowds
Proceedings of the 22nd international conference on World Wide Web companion
Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem
The Journal of Machine Learning Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Interactive collaborative filtering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Wireless Personal Communications: An International Journal
Scheduling black-box mutational fuzzing
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Lazy paired hyper-parameter tuning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Sequential decision making with vector outcomes
Proceedings of the 5th conference on Innovations in theoretical computer science
Machine learning in an auction environment
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to find the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines.In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of T plays, we prove that the per-round payoff of our algorithm approaches that of the best arm at the rate O(T-1/2). We show by a matching lower bound that this is the best possible.We also prove that our algorithm approaches the per-round payoff of any set of strategies at a similar rate: if the best strategy is chosen from a pool of N strategies, then our algorithm approaches the per-round payoff of the strategy at the rate O((log N1/2 T-1/2). Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payoff of the unknown game at the rate O(T-1/2).