Online calibrated forecasts: Memory efficiency versus universality for learning in games

Authors:
Shie Mannor;Jeff S. Shamma;Gürdal Arslan
Affiliations:
Department of Electrical and Computer Engineering, McGill University, Montreal, Canada H3A-2A7;Department of Mechanical and Aerospace Engineering, University of California - Los Angeles, Los Angeles 90095-1597;Department of Electrical Engineering, University of Hawaii at Manoa, Honolulu 96822
Venue:
Machine Learning
Year:
2007

Citing 11
Cited 3

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Competitive Markov decision processes

Competitive Markov decision processes
Stochastic approximation with two time scales

Systems & Control Letters
Online computation and competitive analysis

Online computation and competitive analysis
A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Calibration with many checking rules

Mathematics of Operations Research
Stochastic Approximations and Differential Inclusions

SIAM Journal on Control and Optimization
Stochastic uncoupled dynamics and nash equilibrium: extended abstract

TARK '05 Proceedings of the 10th conference on Theoretical aspects of rationality and knowledge

Multi-agent learning for engineers

Artificial Intelligence
Deterministic calibration and Nash equilibrium

Journal of Computer and System Sciences
Tracking Forecast Memories for Stochastic Decoding

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We provide a simple learning process that enables an agent to forecast a sequence of outcomes. Our forecasting scheme, termed tracking forecast, is based on tracking the past observations while emphasizing recent outcomes. As opposed to other forecasting schemes, we sacrifice universality in favor of a significantly reduced memory requirements. We show that if the sequence of outcomes has certain properties--it has some internal (hidden) state that does not change too rapidly--then the tracking forecast is weakly calibrated so that the forecast appears to be correct most of the time. For binary outcomes, this result holds without any internal state assumptions. We consider learning in a repeated strategic game where each player attempts to compute some forecast of the opponent actions and play a best response to it. We show that if one of the players uses a tracking forecast, while the other player uses a standard learning algorithm (such as exponential regret matching or smooth fictitious play), then the player using the tracking forecast obtains the best response to the actual play of the other players. We further show that if both players use tracking forecast, then under certain conditions on the game matrix, convergence to a Nash equilibrium is possible with positive probability for a larger class of games than the class of games for which smooth fictitious play converges to a Nash equilibrium.