Kernels as features: On kernels, margins, and low-dimensional mappings

Authors:
Maria-Florina Balcan;Avrim Blum;Santosh Vempala
Affiliations:
Computer Science Department, Carnegie Mellon University, Pittsburgh;Computer Science Department, Carnegie Mellon University, Pittsburgh;Department of Mathematics, MIT, Cambridge
Venue:
Machine Learning
Year:
2006

Citing 17
Cited 17

How to construct random functions

Journal of the ACM (JACM)
A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Support-Vector Networks

Machine Learning
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Advances in kernel methods: support vector learning

Advances in kernel methods: support vector learning
Generalization performance of support vector machines and other pattern classifiers

Advances in kernel methods
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
AI Game Programming Wisdom

AI Game Programming Wisdom
Advances in Large Margin Classifiers

Advances in Large Margin Classifiers
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
An Algorithmic Theory of Learning: Robust Concepts and Random Projection

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
On online learning of decision lists

The Journal of Machine Learning Research
Limitations of learning via embeddings in euclidean half spaces

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

On learning with dissimilarity functions

Proceedings of the 24th international conference on Machine learning
A learning theory approach to non-interactive database privacy

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
A theory of learning with similarity functions

Machine Learning
Kernel-Based Nonparametric Regression Method

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Theory and algorithm for learning with dissimilarity functions

Neural Computation
A sparse Johnson: Lindenstrauss transform

Proceedings of the forty-second ACM symposium on Theory of computing
Random projections for face detection under resource constraints

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
An empirical comparison of Kernel-based and dissimilarity-based feature spaces

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets

Information Sciences: an International Journal
Activized learning: transforming passive to active with improved label complexity

The Journal of Machine Learning Research
Random projection ensemble learning with multiple empirical kernels

Knowledge-Based Systems
Structured image segmentation using kernelized features

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
A learning theory approach to noninteractive database privacy

Journal of the ACM (JACM)
Beam search algorithms for multilabel learning

Machine Learning
Efficient active learning of halfspaces: an aggressive approach

The Journal of Machine Learning Research
Semi-supervised learning via sparse model

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin 驴. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only $$\tilde{O}(1/\gamma^2)$$ . In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of $$\tilde{O}(1/\gamma^2)$$ features, such that if the data was linearly separable with margin 驴 under the kernel, then it is approximately separable in this new feature space.