The weighted majority algorithm
Information and Computation
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Predicting Nearly As Well As the Best Pruning of a Decision Tree
Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Efficient learning with virtual threshold gates
Information and Computation
Direct and indirect algorithms for on-line learning of disjunctions
Theoretical Computer Science
Predicting Nearly as well as the best Pruning of a Planar Decision Graph
ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Efficiency versus convergence of Boolean kernels for on-line learning algorithms
Journal of Artificial Intelligence Research
Adapting to a reliable network path
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Online convex optimization in the bandit setting: gradient descent without a gradient
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Tracking the best of many experts
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Hi-index | 0.00 |
We consider a natural convolution kernel defined by a directed graph. Each edge contributes an input. The inputs along a path form a product and the products for all paths are summed. We also have a set of probabilities on the edges so that the outflow from each node is one. We then discuss multiplicative updates on these graphs where the prediction is essentially a kernel computation and the update contributes a factor to each edge. Now the total outflow out of each node is not one any more. However some clever algorithms re-normalize the weights on the paths so that the total outflow out of each node is one again. Finally we discuss the use of regular expressions for speeding up the kernel and re-normalization computation. In particular we rewrite the multiplicative algorithms that predict as well as the best pruning of a series parallel graph in terms of efficient kernel computations.