COLT '90 Proceedings of the third annual workshop on Computational learning theory
On-line learning of smooth functions of a single variable
Theoretical Computer Science
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Improved bounds about on-line learning of smooth-functions of a single variable
Theoretical Computer Science - Special issue on algorithmic learning theory
Calibration with many checking rules
Mathematics of Operations Research
On the influence of the kernel on the consistency of support vector machines
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
On-line prediction with kernels and the complexity approximation principle
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Uniform test of algorithmic randomness over a general space
Theoretical Computer Science
Non-asymptotic calibration and resolution
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Sequential prediction of individual sequences under general loss functions
IEEE Transactions on Information Theory
Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
IEEE Transactions on Neural Networks
Competing with wild prediction rules
Machine Learning
Competing with stationary prediction strategies
COLT'07 Proceedings of the 20th annual conference on Learning theory
Supermartingales in prediction with expert advice
Theoretical Computer Science
Leading strategies in competitive on-line prediction
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Competing with wild prediction rules
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Hi-index | 0.00 |
We consider the problem of on-line prediction of real-valued labels, assumed bounded in absolute value by a known constant, of new objects from known labeled objects. The prediction algorithm’s performance is measured by the squared deviation of the predictions from the actual labels. No stochastic assumptions are made about the way the labels and objects are generated. Instead, we are given a benchmark class of prediction rules some of which are hoped to produce good predictions. We show that for a wide range of infinite-dimensional benchmark classes one can construct a prediction algorithm whose cumulative loss over the first N examples does not exceed the cumulative loss of any prediction rule in the class plus $O(\sqrt{N})$; the main differences from the known results are that we do not impose any upper bound on the norm of the considered prediction rules and that we achieve an optimal leading term in the excess loss of our algorithm. If the benchmark class is “universal” (dense in the class of continuous functions on each compact set), this provides an on-line non-stochastic analogue for universally consistent prediction in non-parametric statistics. We use two proof techniques: one is based on the Aggregating Algorithm and the other on the recently developed method of defensive forecasting.