Communications of the ACM
A new polynomial-time algorithm for linear programming
Combinatorica
Log depth circuits for division and related problems
SIAM Journal on Computing
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM (JACM)
The Strength of Weak Learnability
Machine Learning
From on-line to batch learning
COLT '89 Proceedings of the second annual workshop on Computational learning theory
Information and Computation
Parallel linear programming in fixed dimension almost surely in constant time
Journal of the ACM (JACM)
An introduction to computational learning theory
An introduction to computational learning theory
Limits to parallel computation: P-completeness theory
Limits to parallel computation: P-completeness theory
Boosting a weak learning algorithm by majority
Information and Computation
On the boosting ability of top-down decision tree learning algorithms
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Noise-tolerant parallel learning of geometric concepts
Information and Computation
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An Adaptive Version of the Boost by Majority Algorithm
Machine Learning
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
MadaBoost: A Modification of AdaBoost
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Smooth boosting and learning with malicious noise
The Journal of Machine Learning Research
Excessive Gap Technique in Nonsmooth Convex Minimization
SIAM Journal on Optimization
Boosting in the presence of noise
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Smooth Optimization with Approximate Gradient
SIAM Journal on Optimization
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Random projection, margins, kernels, and feature-selection
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Hi-index | 0.00 |
We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin half-space, based on an algorithm of Nesterov's that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin halfspace over n dimensions using n ċ poly(1/γ) processors and running in time Õ(1/γ)+O(log n). In contrast, naive parallel algorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter γ. Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions.