Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Authors:
Zhuang Wang;Nemanja Djuric;Koby Crammer;Slobodan Vucetic
Affiliations:
Siemens Corporate Research, Princeton, USA;Temple University, Philadelphia,, USA;The Technion, Haifa, Israel;Temple University, Philadelphia, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 21
Cited 1

Support-Vector Networks

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Sparseness of support vector machines

The Journal of Machine Learning Research
Solving large scale linear prediction problems using stochastic gradient descent algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Multiclass Classification with Multi-Prototype Support Vector Machines

The Journal of Machine Learning Research
Trading convexity for scalability

ICML '06 Proceedings of the 23rd international conference on Machine learning
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
Identifying suspicious URLs: an application of large-scale online learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
P-packSVM: Parallel Primal grAdient desCent Kernel SVM

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

The Journal of Machine Learning Research
Bundle Methods for Regularized Risk Minimization

The Journal of Machine Learning Research
Online training on a budget of support vector machines using twin prototypes

Statistical Analysis and Data Mining
Large linear classification when data cannot fit in memory

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

The Journal of Machine Learning Research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Online learning with kernels

IEEE Transactions on Signal Processing

Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Support Vector Machines (SVMs) are among the most popular and successful classification algorithms. Kernel SVMs often reach state-of-the-art accuracies, but suffer from the curse of kernelization due to linear model growth with data size on noisy data. Linear SVMs have the ability to efficiently learn from truly large data, but they are applicable to a limited number of domains due to low representational power. To fill the representability and scalability gap between linear and nonlinear SVMs, we propose the Adaptive Multi-hyperplane Machine (AMM) algorithm that accomplishes fast training and prediction and has capability to solve nonlinear classification problems. AMM model consists of a set of hyperplanes (weights), each assigned to one of the multiple classes, and predicts based on the associated class of the weight that provides the largest prediction. The number of weights is automatically determined through an iterative algorithm based on the stochastic gradient descent algorithm which is guaranteed to converge to a local optimum. Since the generalization bound decreases with the number of weights, a weight pruning mechanism is proposed and analyzed. The experiments on several large data sets show that AMM is nearly as fast during training and prediction as the state-of-the-art linear SVM solver and that it can be orders of magnitude faster than kernel SVM. In accuracy, AMM is somewhere between linear and kernel SVMs. For example, on an OCR task with 8 million highly dimensional training examples, AMM trained in 300 seconds on a single-core processor had 0.54% error rate, which was significantly lower than 2.03% error rate of a linear SVM trained in the same time and comparable to 0.43% error rate of a kernel SVM trained in 2 days on 512 processors. The results indicate that AMM could be an attractive option when solving large-scale classification problems. The software is available at www.dabi.temple.edu/~vucetic/AMM.html.