Online learning with queries

Authors:
Chao-Kai Chiang;Chi-Jen Lu
Affiliations:
Academia Sinica, Taipei, Taiwan and National Taiwan University, Taipei, Taiwan;National Taiwan University, Taipei, Taiwan
Venue:
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Year:
2010

Citing 9
Cited 0

Prediction, Learning, and Games

Prediction, Learning, and Games
Approximation algorithms for budgeted learning problems

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Learning large-alphabet and analog circuits with value injection queries

COLT'07 Proceedings of the 20th annual conference on Learning theory
Regret to the best vs. regret to the average

COLT'07 Proceedings of the 20th annual conference on Learning theory
Multitask learning with expert advice

COLT'07 Proceedings of the 20th annual conference on Learning theory
Online learning with prior knowledge

COLT'07 Proceedings of the 20th annual conference on Learning theory
Logarithmic regret algorithms for online convex optimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Online variance minimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Sequential prediction of individual sequences under general loss functions

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

The online learning problem requires a player to iteratively choose an action in an unknown and changing environment. In the standard setting of this problem, the player has to choose an action in each round before knowing anything about the corresponding loss. However, there are situations in which it seems possible for the player to spend efforts or resources to collect some prior information before her actions. This motivates us to study a variant of the online learning problem, in which the player is allowed to query B bits from the loss vector in each round before choosing her action. Suppose each loss value is represented by K bits and distinct loss values differ by at least some amount δ, and suppose there are N actions to choose and T rounds to play. We provide an algorithm for this problem which achieves a regret of the following form. Before B approaching B1 = NK/2, the regret stays at O(√T ln N), and after B exceeding B1 but before approaching B2 = NK/2 + 3K/2-1, the regret drops slightly to O(√(T ln N)/N), while after B exceeding B2, the regret takes a dramatic drop to (N ln N)/δ. Our algorithm is in fact close to optimal as we also provide regret lower bounds which almost match the regret upper bounds achieved by our algorithm.