The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
A maximum entropy approach to natural language processing
Computational Linguistics
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Making large-scale support vector machine learning practical
Advances in kernel methods
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supertagging: an approach to almost parsing
Computational Linguistics
Rational Kernels: Theory and Algorithms
The Journal of Machine Learning Research
Fast methods for kernel-based text analysis
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Fast Kernel Classifiers with Online and Active Learning
The Journal of Machine Learning Research
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
Kernel-based learning algorithms, such as Support Vector Machines (SVMs) or Perceptron, often rely on sequential optimization where a few examples are added at each iteration. Updating the kernel matrix usually requires matrix-vector multiplications. We propose a new method based on transposition to speedup this computation on sparse data. Instead of dot-products over sparse feature vectors, our computation incrementally merges lists of training examples and minimizes access to the data. Caching and shrinking are also optimized for sparsity. On very large natural language tasks (tagging, translation, text classification) with sparse feature representations, a 20 to 80-fold speedup over LIBSVM is observed using the same SMO algorithm. Theory and experiments explain what type of sparsity structure is needed for this approach to work, and why its adaptation to Maxent sequential optimization is inefficient.