IEEE Transactions on Pattern Analysis and Machine Intelligence
The String-to-String Correction Problem
Journal of the ACM (JACM)
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning Stochastic Regular Grammars by Means of a State Merging Method
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Adaptive duplicate detection using learnable string similarity measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Parameter estimation for probabilistic finite-state transducers
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Melody Recognition with Learned Edit Distances
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Learning state machine-based string edit kernels
Pattern Recognition
A spectral learning algorithm for finite state transducers
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Hi-index | 0.00 |
Many real-world applications such as spell-checking or DNA analysis use the Levenshtein edit-distance to compute similarities between strings. In practice, the costs of the primitive edit operations (insertion, deletion and substitution of symbols) are generally hand-tuned. In this paper, we propose an algorithm to learn these costs. The underlying model is a probabilitic transducer, computed by using grammatical inference techniques, that allows us to learn both the structure and the probabilities of the model. Beyond the fact that the learned transducers are neither deterministic nor stochastic in the standard terminology, they are conditional, thus independant from the distributions of the input strings. Finally, we show through experiments that our method allows us to design cost functions that depend on the string context where the edit operations are used. In other words, we get kinds of context-sensitive edit distances.