Kernels as features: On kernels, margins, and low-dimensional mappings

  • Authors:
  • Maria-Florina Balcan;Avrim Blum;Santosh Vempala

  • Affiliations:
  • Computer Science Department, Carnegie Mellon University, Pittsburgh;Computer Science Department, Carnegie Mellon University, Pittsburgh;Department of Mathematics, MIT, Cambridge

  • Venue:
  • Machine Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin 驴. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only $$\tilde{O}(1/\gamma^2)$$ . In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of $$\tilde{O}(1/\gamma^2)$$ features, such that if the data was linearly separable with margin 驴 under the kernel, then it is approximately separable in this new feature space.