Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Competitive Markov decision processes
Competitive Markov decision processes
Stochastic approximation with two time scales
Systems & Control Letters
Online computation and competitive analysis
Online computation and competitive analysis
A game of prediction with expert advice
Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
Calibration with many checking rules
Mathematics of Operations Research
Stochastic Approximations and Differential Inclusions
SIAM Journal on Control and Optimization
Stochastic uncoupled dynamics and nash equilibrium: extended abstract
TARK '05 Proceedings of the 10th conference on Theoretical aspects of rationality and knowledge
Multi-agent learning for engineers
Artificial Intelligence
Deterministic calibration and Nash equilibrium
Journal of Computer and System Sciences
Tracking Forecast Memories for Stochastic Decoding
Journal of Signal Processing Systems
Hi-index | 0.00 |
We provide a simple learning process that enables an agent to forecast a sequence of outcomes. Our forecasting scheme, termed tracking forecast, is based on tracking the past observations while emphasizing recent outcomes. As opposed to other forecasting schemes, we sacrifice universality in favor of a significantly reduced memory requirements. We show that if the sequence of outcomes has certain properties--it has some internal (hidden) state that does not change too rapidly--then the tracking forecast is weakly calibrated so that the forecast appears to be correct most of the time. For binary outcomes, this result holds without any internal state assumptions. We consider learning in a repeated strategic game where each player attempts to compute some forecast of the opponent actions and play a best response to it. We show that if one of the players uses a tracking forecast, while the other player uses a standard learning algorithm (such as exponential regret matching or smooth fictitious play), then the player using the tracking forecast obtains the best response to the actual play of the other players. We further show that if both players use tracking forecast, then under certain conditions on the game matrix, convergence to a Nash equilibrium is possible with positive probability for a larger class of games than the class of games for which smooth fictitious play converges to a Nash equilibrium.