The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Regret minimizing equilibria and mechanisms for games with strict type uncertainty
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Rational and convergent learning in stochastic games
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Orchestrating multiagent learning of penalty games
SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
We introduce the first near-optimal polynomial algorithm for obtaining the mixed safety level value of an initially unknown multi-stage game, played in a hostile environment, under imperfect monitoring. In an imperfect monitoring setting all that an agent can observe is the current state and its own actions and payoffs, but it can not observe other agents' actions. Our result holds for any multi-stage generic game with a “reset” action.