Large linear classification when data cannot fit in memory

Authors:
Hsiang-Fu Yu;Cho-Jui Hsieh;Kai-Wei Chang;Chih-Jen Lin
Affiliations:
Dept. of Computer Science, National Taiwan University, Taipei, Taiwan Roc;Dept. of Computer Science, National Taiwan University, Taipei, Taiwan Roc;Dept. of Computer Science, National Taiwan University, Taipei, Taiwan Roc;Dept. of Computer Science, National Taiwan University, Taipei, Taiwan Roc
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 12
Cited 13

On the convergence of the coordinate descent method for convex differentiable minimization

Journal of Optimization Theory and Applications
Making large-scale support vector machine learning practical

Advances in kernel methods
On the Learnability and Design of Output Codes for Multiclass Problems

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Classifying large data sets using SVMs with hierarchical clusters

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Compression tools compared

Linux Journal
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
A dual coordinate descent method for large-scale linear SVM

Proceedings of the 25th international conference on Machine learning
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
Sparse Online Learning via Truncated Gradient

The Journal of Machine Learning Research
Streamed learning: one-pass SVMs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
P-packSVM: Parallel Primal grAdient desCent Kernel SVM

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

RankSVR: can preference data help regression?

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Selective block minimization for faster convergence of limited memory large-scale linear models

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Large Linear Classification When Data Cannot Fit in Memory

ACM Transactions on Knowledge Discovery from Data (TKDD)
Large linear classification when data cannot fit in memory

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Object Recognition by Sequential Figure-Ground Ranking

International Journal of Computer Vision
Linear support vector machines via dual cached loops

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Scene aligned pooling for complex video recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Indexed block coordinate descent for large-scale linear classification with limited memory

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A joint optimization of incrementality and revenue to satisfy both advertiser and publisher

Proceedings of the 22nd international conference on World Wide Web companion
Scaling factorization machines to relational data

Proceedings of the VLDB Endowment
Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training

The Journal of Machine Learning Research
Flickr-tag prediction using multi-modal fusion and meta information

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.