Fast transpose methods for kernel learning on sparse data

Authors:
Patrick Haffner
Affiliations:
AT&T Labs-Research, Florham Park, NJ
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 13
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Making large-scale support vector machine learning practical

Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supertagging: an approach to almost parsing

Computational Linguistics
Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
Fast methods for kernel-based text analysis

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel-based learning algorithms, such as Support Vector Machines (SVMs) or Perceptron, often rely on sequential optimization where a few examples are added at each iteration. Updating the kernel matrix usually requires matrix-vector multiplications. We propose a new method based on transposition to speedup this computation on sparse data. Instead of dot-products over sparse feature vectors, our computation incrementally merges lists of training examples and minimizes access to the data. Caching and shrinking are also optimized for sparsity. On very large natural language tasks (tagging, translation, text classification) with sparse feature representations, a 20 to 80-fold speedup over LIBSVM is observed using the same SMO algorithm. Theory and experiments explain what type of sparsity structure is needed for this approach to work, and why its adaptation to Maxent sequential optimization is inefficient.