Sequential decision making with vector outcomes

Authors:
Yossi Azar;Uriel Felge;Michal Feldman;Moshe Tennenholtz
Affiliations:
Tel Aviv University, Tel Aviv, Israel;Weizmann Institute, Rechovot, Israel;Tel Aviv University, Tel Aviv, Israel;Microsoft Research, Herzliya, Israel
Venue:
Proceedings of the 5th conference on Innovations in theoretical computer science
Year:
2014

Citing 7
Cited 0

The weighted majority algorithm

Information and Computation
The Nonstochastic Multiarmed Bandit Problem

SIAM Journal on Computing
Online learning in online auctions

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
On the Minimax Complexity of Pricing in a Changing Environment

Operations Research
Dynamic pricing with limited supply

Proceedings of the 13th ACM Conference on Electronic Commerce
Mastering multi-player games

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a multi-round optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure.