A discriminative model of stochastic edit distance in the form of a conditional transducer

Authors:
Marc Bernard;Jean-Christophe Janodet;Marc Sebban
Affiliations:
EURISE, Université Jean Monnet de Saint-Etienne, Saint-Etienne, France;EURISE, Université Jean Monnet de Saint-Etienne, Saint-Etienne, France;EURISE, Université Jean Monnet de Saint-Etienne, Saint-Etienne, France
Venue:
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Year:
2006

Citing 7
Cited 3

Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
The String-to-String Correction Problem

Journal of the ACM (JACM)
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Parameter estimation for probabilistic finite-state transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition

Melody Recognition with Learned Edit Distances

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Learning state machine-based string edit kernels

Pattern Recognition
A spectral learning algorithm for finite state transducers

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.