Accurate singular values of bidiagonal matrices
SIAM Journal on Scientific and Statistical Computing
The nature of statistical learning theory
The nature of statistical learning theory
Nonlinear component analysis as a kernel eigenvalue problem
Neural Computation
Bayesian Classification With Gaussian Processes
IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse Greedy Matrix Approximation for Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Effect of the Input Density Distribution on Kernel-based Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient svm training using low-rank kernel representations
The Journal of Machine Learning Research
Kernel independent component analysis
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
Predictive low-rank decomposition for kernel methods
ICML '05 Proceedings of the 22nd international conference on Machine learning
A survey of kernel and spectral methods for clustering
Pattern Recognition
Multi-class Discriminant Kernel Learning via Convex Programming
The Journal of Machine Learning Research
Hi-index | 0.01 |
Low-rank representations have received a lot of interest in the application of kernel-based methods. However, these methods made an assumption that the spectrum of the Gaussian or polynomial kernels decays rapidly. This is not always true and its violation may result in performance degradation. In this paper, we propose an effective technique for learning low-rank Mercer kernels (LMK) with fast-decaying spectrum. What distinguishes our kernels from other classical kernels (Gaussian and polynomial kernels) is that the proposed always yields low-rank Gram matrices whose spectrum decays rapidly, no matter what distribution the data are. Furthermore, the LMK can control the decay rate. Thus, our kernels can prevent performance degradation while using the low-rank approximations. Our algorithm has favorable in scalability-it is linear in the number of data points and quadratic in the rank of the Gram matrix. Empirical results demonstrate that the proposed method learns fast-decaying spectrum and significantly improves the performance.