Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Relative loss bounds for multidimensional regression problems
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Relative loss bounds for on-line density estimation with the exponential family of distributions
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Competitive online generalized linear regression under square loss
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Re-adapting the regularization of weights for non-stationary regression
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Weighted last-step min-max algorithm with improved sub-logarithmic regret
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Kernelization of matrix updates, when and how?
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Hi-index | 0.00 |
When relative loss bounds are considered, an on-line learning algorithm is compared to the performance of a class of off-line algorithms, called experts. In this paper we reconsider a result by Vovk, namely an upper bound on the on-line relative loss for linear regression with square loss - here the experts are linear functions. We give a shorter and simpler proof of Vovk's result and give a new motivation for the choice of the predictions of Vovk's learning algorithm. This is done by calculating the, in some sense, best prediction for the last trial of a sequence of trials when it is known that the outcome variable is bounded. We try to generalize these ideas to the case of generalized linear regression where the experts are neurons and give a formula for the "best" prediction for the last trial in this case, too. This prediction turns out to be essentially an integral over the "best" expert applied to the last instance. Predictions that are "optimal" in this sense might be good predictions for long sequences of trials as well.