Selective block minimization for faster convergence of limited memory large-scale linear models

Authors:
Kai-Wei Chang;Dan Roth
Affiliations:
University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 26
Cited 4

Bagging predictors

Machine Learning
Using and combining predictors that specialize

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Making large-scale support vector machine learning practical

Advances in kernel methods
A Random Sampling Technique for Training Support Vector Machines

ALT '01 Proceedings of the 12th International Conference on Algorithmic Learning Theory
On the Learnability and Design of Output Codes for Multiclass Problems

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Incremental parsing with the perceptron algorithm

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
An Efficient Implementation of an Active Set Method for SVMs

The Journal of Machine Learning Research
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
The Forgetron: A Kernel-Based Perceptron on a Budget

SIAM Journal on Computing
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
SVM optimization: inverse dependence on training set size

Proceedings of the 25th international conference on Machine learning
A sequential dual method for large scale multi-class linear svms

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
Streamed learning: one-pass SVMs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Multi-class confidence weighted algorithms

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training

The Journal of Machine Learning Research
Large linear classification when data cannot fit in memory

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Generating confusion sets for context-sensitive error correction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Dual coordinate descent methods for logistic regression and maximum entropy models

Machine Learning

Large Linear Classification When Data Cannot Fit in Memory

ACM Transactions on Knowledge Discovery from Data (TKDD)
Linear support vector machines via dual cached loops

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Indexed block coordinate descent for large-scale linear classification with limited memory

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
b-bit minwise hashing in practice

Proceedings of the 5th Asia-Pacific Symposium on Internetware

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the size of data sets used to build classifiers steadily increases, training a linear model efficiently with limited memory becomes essential. Several techniques deal with this problem by loading blocks of data from disk one at a time, but usually take a considerable number of iterations to converge to a reasonable model. Even the best block minimization techniques [1] require many block loads since they treat all training examples uniformly. As disk I/O is expensive, reducing the amount of disk access can dramatically decrease the training time. This paper introduces a selective block minimization (SBM) algorithm, a block minimization method that makes use of selective sampling. At each step, SBM updates the model using data consisting of two parts: (1) new data loaded from disk and (2) a set of informative samples already in memory from previous steps. We prove that, by updating the linear model in the dual form, the proposed method fully utilizes the data in memory and converges to a globally optimal solution on the entire data. Experiments show that the SBM algorithm dramatically reduces the number of blocks loaded from disk and consequently obtains an accurate and stable model quickly on both binary and multi-class classification.