Approximating a gram matrix for improved kernel-based learning

Authors:
Petros Drineas;Michael W. Mahoney
Affiliations:
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York;Department of Mathematics, Yale University, New Haven, CT
Venue:
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Year:
2005

Citing 16
Cited 7

Computational methods for integral equations

Computational methods for integral equations
Nonlinear component analysis as a kernel eigenvalue problem

Neural Computation
Clustering in large graphs and matrices

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Fast computation of low rank matrix approximations

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Spectral analysis of data

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Pass efficient algorithms for approximating large matrices

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Sparse Greedy Matrix Approximation for Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Effect of the Input Density Distribution on Kernel-based Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Fast Monte-Carlo Algorithms for Approximate Matrix Multiplication

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Efficient svm training using low-rank kernel representations

The Journal of Machine Learning Research
Spectral Grouping Using the Nyström Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning a kernel matrix for nonlinear dimensionality reduction

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Sampling sub-problems of heterogeneous max-cut problems and approximation algorithms

STACS'05 Proceedings of the 22nd annual conference on Theoretical Aspects of Computer Science

On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations

Computational Linguistics
Fast approximate spectral clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient band approximation of Gram matrices for large scale kernel methods on GPUs

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Spectral sampling of manifolds

ACM SIGGRAPH Asia 2010 papers
Distributed approximate spectral clustering for large-scale datasets

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Low rank approximation and regression in input sparsity time

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n × n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form ${\tilde G}_{k} = CW^{+}_{k}C^{T}$, where C is a matrix consisting of a small number c of columns of G and Wk is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let || ·||2 and || ·||F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let Gk be the best rank-k approximation to G. We prove that by choosing O(k/ε4) columns $${\left\|G - CW^{+}_{k}C^{T}\right\|_{\xi}} \leq \|A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n × n Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form ${\tilde G}_{k} = CW^{+}_{k}C^{T}$, where C is a matrix consisting of a small number c of columns of G and Wk is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let || ·||2 and || ·||F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let Gk be the best rank-k approximation to G. We prove that by choosing O(k/ε4) columns $${\left\|G - CW^{+}_{k}C^{T}\right\|_{\xi}} \leq \|G - G_{k}\|_{\xi} + \sum\limits_{i=1}^{n} G^{2}_{ii},$$ both in expectation and with high probability, for both ξ = 2,F, and for all k : 0 ≤k≤ rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. |_{\xi} + \sum\limits_{i=1}^{n} G^{2}_{ii},$$ both in expectation and with high probability, for both ξ = 2,F, and for all k : 0 ≤k≤ rank(W). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage.