Learning good edit similarities with generalization guarantees

Authors:
Aurélien Bellet;Amaury Habrard;Marc Sebban
Affiliations:
Laboratoire Hubert Curien UMR CNRS 5516, University of Jean Monnet, Saint-Etienne Cedex, France;Laboratoire d'Informatique Fondamentale UMR CNRS 6166, University of Aix-Marseille, Marseille Cedex, France;Laboratoire Hubert Curien UMR CNRS 5516, University of Jean Monnet, Saint-Etienne Cedex, France
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Year:
2011

Citing 10
Cited 0

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stability and generalization

The Journal of Machine Learning Research
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On a theory of learning with similarity functions

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition
Information-theoretic metric learning

Proceedings of the 24th international conference on Machine learning
On learning with dissimilarity functions

Proceedings of the 24th international conference on Machine learning
Learning probabilistic models of tree edit distance

Pattern Recognition
Distance Metric Learning for Large Margin Nearest Neighbor Classification

The Journal of Machine Learning Research
Bayesian Similarity Model Estimation for Approximate Recognized Text Search

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity and distance functions are essential to many learning algorithms, thus training them has attracted a lot of interest. When it comes to dealing with structured data (e.g., strings or trees), edit similarities are widely used, and there exists a few methods for learning them. However, these methods offer no theoretical guarantee as to the generalization performance and discriminative power of the resulting similarities. Recently, a theory of learning with (ε, γ, τ)-good similarity functions was proposed. This new theory bridges the gap between the properties of a similarity function and its performance in classification. In this paper, we propose a novel edit similarity learning approach (GESL) driven by the idea of (ε, γ, τ)-goodness, which allows us to derive generalization guarantees using the notion of uniform stability. We experimentally show that edit similarities learned with our method induce classification models that are both more accurate and sparser than those induced by the edit distance or edit similarities learned with a state-of-the-art method.