On a theory of learning with similarity functions

Authors:
Maria-Florina Balcan;Avrim Blum
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 13
Cited 12

The Strength of Weak Learnability

Machine Learning
From on-line to batch learning

COLT '89 Proceedings of the second annual workshop on Computational learning theory
The nature of statistical learning theory

The nature of statistical learning theory
Support-Vector Networks

Machine Learning
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Advances in Large Margin Classifiers

Advances in Large Margin Classifiers
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Perceptrons: An Introduction to Computational Geometry

Perceptrons: An Introduction to Computational Geometry
Agnostically Learning Halfspaces

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
An introduction to kernel-based learning algorithms

IEEE Transactions on Neural Networks

On learning with dissimilarity functions

Proceedings of the 24th international conference on Machine learning
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
A theory of learning with similarity functions

Machine Learning
Theory and algorithm for learning with dissimilarity functions

Neural Computation
A class possibility based kernel to increase classification accuracy for small data sets using support vector machines

Expert Systems with Applications: An International Journal
Scaling up semi-supervised learning: an efficient and effective LLGC variant

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
How good is a kernel when used as a similarity measure?

COLT'07 Proceedings of the 20th annual conference on Learning theory
Selection of basis functions guided by the L2 soft margin

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Combined regression and ranking

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning good edit similarities with generalization guarantees

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Algorithms for learning kernels based on centered alignment

The Journal of Machine Learning Research
Guaranteed classification via regularized similarity learning

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel functions have become an extremely popular tool in machine learning, with an attractive theory as well. This theory views a kernel as implicitly mapping data points into a possibly very high dimensional space, and describes a kernel function as being good for a given learning problem if data is separable by a large margin in that implicit space. However, while quite elegant, this theory does not directly correspond to one's intuition of a good kernel as a good similarity function. Furthermore, it may be difficult for a domain expert to use the theory to help design an appropriate kernel for the learning task at hand since the implicit mapping may not be easy to calculate. Finally, the requirement of positive semi-definiteness may rule out the most natural pairwise similarity functions for the given problem domain.In this work we develop an alternative, more general theory of learning with similarity functions (i.e., sufficient conditions for a similarity function to allow one to learn well) that does not require reference to implicit spaces, and does not require the function to be positive semi-definite (or even symmetric). Our results also generalize the standard theory in the sense that any good kernel function under the usual definition can be shown to also be a good similarity function under our definition (though with some loss in the parameters). In this way, we provide the first steps towards a theory of kernels that describes the effectiveness of a given kernel function in terms of natural similarity-based properties.