How to construct random functions
Journal of the ACM (JACM)
A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Machine Learning
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Advances in kernel methods: support vector learning
Advances in kernel methods: support vector learning
Generalization performance of support vector machines and other pattern classifiers
Advances in kernel methods
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
AI Game Programming Wisdom
Advances in Large Margin Classifiers
Advances in Large Margin Classifiers
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
An Algorithmic Theory of Learning: Robust Concepts and Random Projection
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Database-friendly random projections: Johnson-Lindenstrauss with binary coins
Journal of Computer and System Sciences - Special issu on PODS 2001
On online learning of decision lists
The Journal of Machine Learning Research
Limitations of learning via embeddings in euclidean half spaces
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
On learning with dissimilarity functions
Proceedings of the 24th international conference on Machine learning
A learning theory approach to non-interactive database privacy
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
A discriminative framework for clustering via similarity functions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
A theory of learning with similarity functions
Machine Learning
Kernel-Based Nonparametric Regression Method
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Theory and algorithm for learning with dissimilarity functions
Neural Computation
A sparse Johnson: Lindenstrauss transform
Proceedings of the forty-second ACM symposium on Theory of computing
Random projections for face detection under resource constraints
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
An empirical comparison of Kernel-based and dissimilarity-based feature spaces
SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets
Information Sciences: an International Journal
Activized learning: transforming passive to active with improved label complexity
The Journal of Machine Learning Research
Random projection ensemble learning with multiple empirical kernels
Knowledge-Based Systems
Structured image segmentation using kernelized features
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
A learning theory approach to noninteractive database privacy
Journal of the ACM (JACM)
Beam search algorithms for multilabel learning
Machine Learning
Efficient active learning of halfspaces: an aggressive approach
The Journal of Machine Learning Research
Semi-supervised learning via sparse model
Neurocomputing
Hi-index | 0.00 |
Kernel functions are typically viewed as providing an implicit mapping of points into a high-dimensional space, with the ability to gain much of the power of that space without incurring a high cost if the result is linearly-separable by a large margin 驴. However, the Johnson-Lindenstrauss lemma suggests that in the presence of a large margin, a kernel function can also be viewed as a mapping to a low-dimensional space, one of dimension only $$\tilde{O}(1/\gamma^2)$$ . In this paper, we explore the question of whether one can efficiently produce such low-dimensional mappings, using only black-box access to a kernel function. That is, given just a program that computes K(x,y) on inputs x,y of our choosing, can we efficiently construct an explicit (small) set of features that effectively capture the power of the implicit high-dimensional space? We answer this question in the affirmative if our method is also allowed black-box access to the underlying data distribution (i.e., unlabeled examples). We also give a lower bound, showing that if we do not have access to the distribution, then this is not possible for an arbitrary black-box kernel function; we leave as an open problem, however, whether this can be done for standard kernel functions such as the polynomial kernel. Our positive result can be viewed as saying that designing a good kernel function is much like designing a good feature space. Given a kernel, by running it in a black-box manner on random unlabeled examples, we can efficiently generate an explicit set of $$\tilde{O}(1/\gamma^2)$$ features, such that if the data was linearly separable with margin 驴 under the kernel, then it is approximately separable in this new feature space.