Communications of the ACM
The Strength of Weak Learnability
Machine Learning
An introduction to computational learning theory
An introduction to computational learning theory
The hardness of approximate optima in lattices, codes, and systems of linear equations
Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Exploiting generative models in discriminative classifiers
Proceedings of the 1998 conference on Advances in neural information processing systems II
AI Game Programming Wisdom
Learning in Neural Networks: Theoretical Foundations
Learning in Neural Networks: Theoretical Foundations
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Rademacher and gaussian complexities: risk bounds and structural results
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Learning the Kernel Matrix with Semidefinite Programming
The Journal of Machine Learning Research
Agnostically Learning Halfspaces
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
On a theory of learning with similarity functions
ICML '06 Proceedings of the 23rd international conference on Machine learning
On learning with dissimilarity functions
Proceedings of the 24th international conference on Machine learning
A discriminative framework for clustering via similarity functions
STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Theory and algorithm for learning with dissimilarity functions
Neural Computation
Top-Down Approach to Image Similarity Measures
ICCVG 2008 Proceedings of the International Conference on Computer Vision and Graphics: Revised Papers
Kernels for Periodic Time Series Arising in Astronomy
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Linear Programming Boosting by Column and Row Generation
DS '09 Proceedings of the 12th International Conference on Discovery Science
Enhanced email spam filtering through combining similarity graphs
Proceedings of the fourth ACM international conference on Web search and data mining
Analog Integrated Circuits and Signal Processing
On the usefulness of similarity based projection spaces for transfer learning
SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Supervised segmentation of fiber tracts
SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Beam search algorithms for multilabel learning
Machine Learning
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
Kernel functions have become an extremely popular tool in machine learning, with an attractive theory as well. This theory views a kernel as implicitly mapping data points into a possibly very high dimensional space, and describes a kernel function as being good for a given learning problem if data is separable by a large margin in that implicit space. However, while quite elegant, this theory does not necessarily correspond to the intuition of a good kernel as a good measure of similarity, and the underlying margin in the implicit space usually is not apparent in "natural" representations of the data. Therefore, it may be difficult for a domain expert to use the theory to help design an appropriate kernel for the learning task at hand. Moreover, the requirement of positive semi-definiteness may rule out the most natural pairwise similarity functions for the given problem domain.In this work we develop an alternative, more general theory of learning with similarity functions (i.e., sufficient conditions for a similarity function to allow one to learn well) that does not require reference to implicit spaces, and does not require the function to be positive semi-definite (or even symmetric). Instead, our theory talks in terms of more direct properties of how the function behaves as a similarity measure. Our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels and more general similarity functions that describes the effectiveness of a given function in terms of natural similarity-based properties.