Learning state machine-based string edit kernels

Authors:
Aurélien Bellet;Marc Bernard;Thierry Murgue;Marc Sebban
Affiliations:
Université de Lyon, F-42023 Saint-ítienne, France and CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000 Saint-ítienne, France and Université de Saint-ítienne, Jean-Monnet ...;Université de Lyon, F-42023 Saint-ítienne, France and CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000 Saint-ítienne, France and Université de Saint-ítienne, Jean-Monnet ...;Université de Lyon, F-42023 Saint-ítienne, France and CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000 Saint-ítienne, France and Université de Saint-ítienne, Jean-Monnet ...;Université de Lyon, F-42023 Saint-ítienne, France and CNRS, UMR 5516, Laboratoire Hubert Curien, F-42000 Saint-ítienne, France and Université de Saint-ítienne, Jean-Monnet ...
Venue:
Pattern Recognition
Year:
2010

Citing 15
Cited 1

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical

Advances in kernel methods
The String-to-String Correction Problem

Journal of the ACM (JACM)
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Text classification using string kernels

The Journal of Machine Learning Research
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A class of edit kernels for SVMs to predict translation initiation sites in eukaryotic mRNAs

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Rational Kernels: Theory and Algorithms

The Journal of Machine Learning Research
Parameter estimation for probabilistic finite-state transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Mismatch string kernels for discriminative protein classification

Bioinformatics
Protein homology detection using string alignment kernels

Bioinformatics
Edit distance-based kernel functions for structural pattern classification

Pattern Recognition
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition
Learning probabilistic models of tree edit distance

Pattern Recognition
A discriminative model of stochastic edit distance in the form of a conditional transducer

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

Weighted symbols-based edit distance for string-structured image classification

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden Markov model) and compares two strings according to how they are generated by M. On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing conditional probabilities. In this paper, we adapt this approach to edit distance-based conditional distributions and we present a way to learn a new string edit kernel. We show that the practical computation of such a kernel between two strings x and x^' built from an alphabet @S requires (i) to learn edit probabilities in the form of the parameters of a stochastic state machine and (ii) to calculate an infinite sum over @S^* by resorting to the intersection of probabilistic automata as done for rational kernels. We show on a handwritten character recognition task that our new kernel outperforms not only the state of the art string kernels and string edit kernels but also the standard edit distance used by a neighborhood-based classifier.