Algorithms and hardness results for parallel large margin learning

Authors:
Philip M. Long;Rocco A. Servedio
Affiliations:
Microsoft, Sunnyvale, CA;Department of Computer Science, Columbia University, New York, NY
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 24
Cited 0

A theory of the learnable

Communications of the ACM
A new polynomial-time algorithm for linear programming

Combinatorica
Log depth circuits for division and related problems

SIAM Journal on Computing
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
The Strength of Weak Learnability

Machine Learning
From on-line to batch learning

COLT '89 Proceedings of the second annual workshop on Computational learning theory
Learning in parallel

Information and Computation
Parallel linear programming in fixed dimension almost surely in constant time

Journal of the ACM (JACM)
An introduction to computational learning theory

An introduction to computational learning theory
Limits to parallel computation: P-completeness theory

Limits to parallel computation: P-completeness theory
Boosting a weak learning algorithm by majority

Information and Computation
On the boosting ability of top-down decision tree learning algorithms

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Noise-tolerant parallel learning of geometric concepts

Information and Computation
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An Adaptive Version of the Boost by Majority Algorithm

Machine Learning
Logistic Regression, AdaBoost and Bregman Distances

Machine Learning
MadaBoost: A Modification of AdaBoost

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Smooth boosting and learning with malicious noise

The Journal of Machine Learning Research
Excessive Gap Technique in Nonsmooth Convex Minimization

SIAM Journal on Optimization
Boosting in the presence of noise

Journal of Computer and System Sciences - Special issue: Learning theory 2003
An algorithmic theory of learning: Robust concepts and random projection

Machine Learning
Smooth Optimization with Approximate Gradient

SIAM Journal on Optimization
Martingale boosting

COLT'05 Proceedings of the 18th annual conference on Learning Theory
Random projection, margins, kernels, and feature-selection

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of learning an unknown large-margin halfspace in the context of parallel computation, giving both positive and negative results. As our main positive result, we give a parallel algorithm for learning a large-margin half-space, based on an algorithm of Nesterov's that performs gradient descent with a momentum term. We show that this algorithm can learn an unknown γ-margin halfspace over n dimensions using n ċ poly(1/γ) processors and running in time Õ(1/γ)+O(log n). In contrast, naive parallel algorithms that learn a γ-margin halfspace in time that depends polylogarithmically on n have an inverse quadratic running time dependence on the margin parameter γ. Our negative result deals with boosting, which is a standard approach to learning large-margin halfspaces. We prove that in the original PAC framework, in which a weak learning algorithm is provided as an oracle that is called by the booster, boosting cannot be parallelized. More precisely, we show that, if the algorithm is allowed to call the weak learner multiple times in parallel within a single boosting stage, this ability does not reduce the overall number of successive stages of boosting needed for learning by even a single stage. Our proof is information-theoretic and does not rely on unproven assumptions.