Asymptotically efficient adaptive control in stochastic regression models
Advances in Applied Mathematics
Annals of Operations Research
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Multi-armed bandit problem revisited
Journal of Optimization Theory and Applications
Regular Article: Optimal Adaptive Policies for Sequential Allocation Problems
Advances in Applied Mathematics
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
The Journal of Machine Learning Research
An adaptive algorithm for selecting profitable keywords for search-based advertising services
EC '06 Proceedings of the 7th ACM conference on Electronic commerce
On-line evolutionary computation for reinforcement learning in stochastic domains
Proceedings of the 8th annual conference on Genetic and evolutionary computation
The discoverability of the web
Proceedings of the 16th international conference on World Wide Web
Evolutionary Function Approximation for Reinforcement Learning
The Journal of Machine Learning Research
Combining online and offline knowledge in UCT
Proceedings of the 24th international conference on Machine learning
Multi-armed bandit problems with dependent arms
Proceedings of the 24th international conference on Machine learning
Empirical Studies in Action Selection with Reinforcement Learning
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Dynamic cost-per-action mechanisms and applications to online advertising
Proceedings of the 17th international conference on World Wide Web
Multi-armed bandits in metric spaces
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Adaptive operator selection with dynamic multi-armed bandits
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Proceedings of the 25th international conference on Machine learning
Learning diverse rankings with multi-armed bandits
Proceedings of the 25th international conference on Machine learning
Sample-based learning and search with permanent and transient memories
Proceedings of the 25th international conference on Machine learning
Rollout sampling approximate policy iteration
Machine Learning
Tuning Bandit Algorithms in Stochastic Environments
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Extreme Value Based Adaptive Operator Selection
Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Active Learning in Multi-armed Bandits
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Learning to play Go using recursive neural networks
Neural Networks
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Recent Advances in Reinforcement Learning
Optimistic Planning of Deterministic Systems
Recent Advances in Reinforcement Learning
Knowledge Generation for Improving Simulations in UCT for General Game Playing
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Approximation algorithms for restless bandit problems
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Improving the Exploration Strategy in Bandit Algorithms
Learning and Intelligent Optimization
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Theoretical Computer Science
Adaptive bidding for display advertising
Proceedings of the 18th international conference on World wide web
A Nonparametric Asymptotic Analysis of Inventory Planning with Censored Demand
Mathematics of Operations Research
To create neuro-controlled game opponent from UCT-created data
Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
Bandit-based optimization on graphs with application to library performance tuning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Piecewise-stationary bandit problems with side observations
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Characterizing truthful multi-armed bandit mechanisms: extended abstract
Proceedings of the 10th ACM conference on Electronic commerce
The price of truthfulness for pay-per-click auctions
Proceedings of the 10th ACM conference on Electronic commerce
Adaptive play in Texas Hold'em Poker
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Analysis of adaptive operator selection techniques on the royal road and long k-path problems
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Extreme: dynamic multi-armed bandits for adaptive operator selection
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
An Efficient and Adaptive Mechanism for Parallel Simulation Replication
PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
Efficient Multi-start Strategies for Local Search Algorithms
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
The max K-armed bandit: a new model of exploration applied to search heuristic selection
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Simulation-based approach to general game playing
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Continuous time associative bandit problems
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A machine learning approach for statistical software testing
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Optimal contraction theorem for exploration-exploitation tradeoff in search and optimization
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
UCT for tactical assault planning in real-time strategy games
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Extreme compass and dynamic multi-armed bandits for adaptive operator selection
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Coevolving intelligent game players in a cultural framework
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Monte-Carlo Tree Search in Poker Using Expected Reward Distributions
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Regret Minimization and Job Scheduling
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Monte Carlo search applied to card selection in magic: the gathering
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
A study on security evaluation methodology for image-based biometrics authentication systems
BTAS'09 Proceedings of the 3rd IEEE international conference on Biometrics: Theory, applications and systems
Improved rates for the stochastic continuum-armed bandit problem
COLT'07 Proceedings of the 20th annual conference on Learning theory
A contextual-bandit approach to personalized news article recommendation
Proceedings of the 19th international conference on World wide web
Bandit-based Monte-Carlo planning for the single-machine total weighted tardiness scheduling problem
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Structural statistical software testing with active learning in a graph
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Backpropagation modification in Monte-Carlo game tree search
IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
To create intelligent adaptive neuro-controller of game opponent from UCT-created data
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Reward-modulated hebbian learning of decision making
Neural Computation
Truthful mechanisms with implicit payment computation
Proceedings of the 11th ACM conference on Electronic commerce
Online regret bounds for Markov decision processes with deterministic transitions
Theoretical Computer Science
Active learning in heteroscedastic noise
Theoretical Computer Science
Pure exploration in multi-armed bandits problems
ALT'09 Proceedings of the 20th international conference on Algorithmic learning theory
Toward comparison-based adaptive operator selection
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Proceedings of the 12th annual conference companion on Genetic and evolutionary computation
Opportunistic spectrum access with multiple users: learning under competition
INFOCOM'10 Proceedings of the 29th conference on Information communications
Linearly Parameterized Bandits
Mathematics of Operations Research
A mean-based approach for real-time planning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Combining active learning and reactive control for robot grasping
Robotics and Autonomous Systems
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Approximation algorithms for restless bandit problems
Journal of the ACM (JACM)
Sharp dichotomies for regret minimization in metric spaces
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Tug-of-war model for multi-armed bandit problem
UC'10 Proceedings of the 9th international conference on Unconventional computation
Comparison-based adaptive strategy selection with bandits in differential evolution
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A minimum relative entropy principle for learning and acting
Journal of Artificial Intelligence Research
Distributed learning in multi-armed bandit with multiple players
IEEE Transactions on Signal Processing
Bandit-based estimation of distribution algorithms for noisy optimization: rigorous runtime analysis
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Consistency modifications for automatically tuned Monte-Carlo tree search
LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Proceedings of the fourth ACM international conference on Web search and data mining
Value of learning in sponsored search auctions
WINE'10 Proceedings of the 6th international conference on Internet and network economics
Solving non-stationary bandit problems by random sampling from sibling Kalman filters
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
On the scalability of parallel UCT
CG'10 Proceedings of the 7th international conference on Computers and games
Biasing Monte-Carlo simulations through RAVE values
CG'10 Proceedings of the 7th international conference on Computers and games
Node-expansion operators for the UCT algorithm
CG'10 Proceedings of the 7th international conference on Computers and games
Regret Bounds and Minimax Policies under Partial Monitoring
The Journal of Machine Learning Research
Pure exploration in finitely-armed and continuous-armed bandits
Theoretical Computer Science
Artificial Intelligence
Sampled fictitious play for approximate dynamic programming
Computers and Operations Research
Prior kowledge in larning fnite prameter saces
FG'09 Proceedings of the 14th international conference on Formal grammar
Not all parents are equal for MO-CMA-ES
EMO'11 Proceedings of the 6th international conference on Evolutionary multi-criterion optimization
Monte-Carlo tree search and rapid action value estimation in computer Go
Artificial Intelligence
Automating the runtime performance evaluation of simulation algorithms
Winter Simulation Conference
Winter Simulation Conference
Approximating n-player behavioural strategy nash equilibria using coevolution
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Policy learning in resource-constrained optimization
Proceedings of the 13th annual conference on Genetic and evolutionary computation
The road to VEGAS: guiding the search over neutral networks
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Analyzing bandit-based adaptive operator selection mechanisms
Annals of Mathematics and Artificial Intelligence
A dynamic programming strategy to balance exploration and exploitation in the bandit problem
Annals of Mathematics and Artificial Intelligence
EvoCOP'11 Proceedings of the 11th European conference on Evolutionary computation in combinatorial optimization
Selecting Simulation Algorithm Portfolios by Genetic Algorithms
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Click shaping to optimize multiple objectives
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to trade off between exploration and exploitation in multiclass bandit prediction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Better Algorithms for Benign Bandits
The Journal of Machine Learning Research
The Journal of Machine Learning Research
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Learning the demand curve in posted-price digital goods auctions
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Multigame playing by means of UCT enhanced with automatically generated evaluation functions
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Parallel Monte-Carlo tree search for HPC systems
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A selecting-the-best method for budgeted model selection
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
ShareBoost: boosting for multi-view learning with performance guarantees
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
ACM SIGMETRICS Performance Evaluation Review - Special Issue on IFIP PERFORMANCE 2011- 29th International Symposium on Computer Performance, Modeling, Measurement and Evaluation
Personalized pricing recommender system: multi-stage epsilon-greedy approach
Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems
Monte-carlo style UCT search for boolean satisfiability
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Lipschitz bandits without the Lipschitz constant
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Deviations of stochastic bandit regret
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
On upper-confidence bound policies for switching bandit problems
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Efficient multi-start strategies for local search algorithms
Journal of Artificial Intelligence Research
Hierarchical Knowledge Gradient for Sequential Sampling
The Journal of Machine Learning Research
Dynamic channel, rate selection and scheduling for white spaces
Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies
Multi-armed bandits with episode context
Annals of Mathematics and Artificial Intelligence
A simple distribution-free approach to the max k-armed bandit problem
CP'06 Proceedings of the 12th international conference on Principles and Practice of Constraint Programming
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
The grand challenge of computer Go: Monte Carlo tree search and extensions
Communications of the ACM
Multiple overlapping tiles for contextual monte carlo tree search
EvoApplicatons'10 Proceedings of the 2010 international conference on Applications of Evolutionary Computation - Volume Part I
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Creating an upper-confidence-tree program for havannah
ACG'09 Proceedings of the 12th international conference on Advances in Computer Games
Managing Delegated Search Over Design Spaces
Management Science
Bandit-Based genetic programming
EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Modelling empathy in social robotic companions
UMAP'11 Proceedings of the 19th international conference on Advances in User Modeling
Group recommendations via multi-armed bandits
Proceedings of the 21st international conference companion on World Wide Web
Dynamical information retrieval modelling: a portfolio-armed bandit machine approach
Proceedings of the 21st international conference companion on World Wide Web
PolyCert: polymorphic self-optimizing replication for in-memory transactional grids
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Computing approximate Nash Equilibria and robust best-responses using sampling
Journal of Artificial Intelligence Research
Dynamic pricing with limited supply
Proceedings of the 13th ACM Conference on Electronic Commerce
A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities
Proceedings of the 13th ACM Conference on Electronic Commerce
The K-armed dueling bandits problem
Journal of Computer and System Sciences
Evolutionary operator self-adaptation with diverse operators
EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Hyperparameter tuning in bandit-based adaptive operator selection
EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
On combining decisions from multiple expert imitators for performance
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Online planning for ad hoc autonomous agent teams
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Real-time solving of quantified CSPs based on Monte-Carlo game tree search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Learning data transformation rules through examples: preliminary results
Proceedings of the Ninth International Workshop on Information Integration on the Web
The Knowledge Gradient Algorithm for a General Class of Online Learning Problems
Operations Research
Automatic discovery of ranking formulas for playing with multi-armed bandits
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Guiding combinatorial optimization with UCT
CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
DCOPs and bandits: exploration and exploitation in decentralised coordination
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Just add Pepper: extending learning algorithms for repeated matrix games to repeated Markov games
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Optimistic Bayesian sampling in contextual-bandit problems
The Journal of Machine Learning Research
Playing repeated Stackelberg games with unknown opponents
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Personalized click shaping through lagrangian duality for online recommendation
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Dynamic Pricing Under a General Parametric Choice Model
Operations Research
A diversity dilemma in evolutionary markets
Proceedings of the 13th International Conference on Electronic Commerce
LogUCB: an explore-exploit algorithm for comments recommendation
Proceedings of the 21st ACM international conference on Information and knowledge management
Sequential selection of correlated ads by POMDPs
Proceedings of the 21st ACM international conference on Information and knowledge management
Multi-armed bandit formulation of the task partitioning problem in swarm robotics
ANTS'12 Proceedings of the 8th international conference on Swarm Intelligence
Adaptive operator selection at the hyper-level
PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part II
Bootstrapping monte carlo tree search with an imperfect heuristic
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Thompson sampling: an asymptotically optimal finite-time analysis
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Regret bounds for restless markov bandits
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Confidence bounds for statistical model checking of probabilistic hybrid systems
FORMATS'12 Proceedings of the 10th international conference on Formal Modeling and Analysis of Timed Systems
PolyCert: polymorphic self-optimizing replication for in-memory transactional grids
Proceedings of the 12th International Middleware Conference
Learning and incentives in user-generated content: multi-armed bandits with endogenous arms
Proceedings of the 4th conference on Innovations in Theoretical Computer Science
New algorithms for budgeted learning
Machine Learning
A contextual-bandit algorithm for mobile context-aware recommender system
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Pilot, rollout and monte carlo tree search methods for job shop scheduling
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Upper confidence tree-based consistent reactive planning application to minesweeper
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Bandit-based structure learning for bayesian network classifiers
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
IEEE/ACM Transactions on Networking (TON)
Reusing historical interaction data for faster online learning to rank for IR
Proceedings of the sixth ACM international conference on Web search and data mining
Exploration / exploitation trade-off in mobile context-aware recommender systems
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Content recommendation on web portals
Communications of the ACM
Sustainable cooperative coevolution with a multi-armed bandit
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Dynamic Pay-Per-Action Mechanisms and Applications to Online Advertising
Operations Research
Micro adaptivity in Vectorwise
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Multi-parameter mechanisms with implicit payment computation
Proceedings of the fourteenth ACM conference on Electronic commerce
Distributed Gibbs: a memory-bounded sampling-based DCOP algorithm
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Online implicit agent modelling
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Learning exploration strategies in model-based reinforcement learning
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Learning in real-time in repeated games using experts
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
A generic adaptive simulation algorithm for component-based simulation systems
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Truthful incentives in crowdsourcing tasks using regret minimization mechanisms
Proceedings of the 22nd international conference on World Wide Web
Mixing bandits: a recipe for improved cold-start recommendations in a social network
Proceedings of the 7th Workshop on Social Network Mining and Analysis
Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem
The Journal of Machine Learning Research
Ranked bandits in metric spaces: learning diverse rankings over large document collections
The Journal of Machine Learning Research
Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality
The Journal of Machine Learning Research
Interactive collaborative filtering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Evaluating simulation software components with player rating systems
Proceedings of the 6th International ICST Conference on Simulation Tools and Techniques
Design and parametric considerations for artificial neural network pruning in UCT game playing
Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Automatic ad format selection via contextual bandits
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scheduling black-box mutational fuzzing
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Online Learning for Personalized Room-Level Thermal Control: A Multi-Armed Bandit Framework
Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings
OM Forum---Operations Management Challenges for Some “Cleantech” Firms
Manufacturing & Service Operations Management
Monte-Carlo expectation maximization for decentralized POMDPs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Sufficiency-based selection strategy for MCTS
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Lazy paired hyper-parameter tuning
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Lifelong learning for acquiring the wisdom of the crowd
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Using reinforcement learning to find an optimal set of features
Computers & Mathematics with Applications
LASER: a scalable response prediction platform for online advertising
Proceedings of the 7th ACM international conference on Web search and data mining
Relative confidence sampling for efficient on-line ranker evaluation
Proceedings of the 7th ACM international conference on Web search and data mining
Monte-Carlo tree search for Bayesian reinforcement learning
Applied Intelligence
Online learning for auction mechanism in bandit setting
Decision Support Systems
Robustness of stochastic bandit policies
Theoretical Computer Science
Counterfactual reasoning and learning systems: the example of computational advertising
The Journal of Machine Learning Research
Machine learning in an auction environment
Proceedings of the 23rd international conference on World wide web
Efficient bidding strategies for Cliff-Edge problems
Autonomous Agents and Multi-Agent Systems
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search
Journal of Artificial Intelligence Research
Adaptive crawler for external hyperlinks search and acquisition
Automation and Remote Control
A tour of machine learning: An AI perspective
AI Communications - ECAI 2012 Turing and Anniversary Track
Hi-index | 0.04 |
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.