A noise model on learning sets of strings
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparison of fast nearest neighbour classifiers for handwritten character recognition
Pattern Recognition Letters
Comparison of AESA and LAESA search algorithms using string and tree-edit-distances
Pattern Recognition Letters
Parameter estimation for probabilistic finite-state transducers
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Memory-Based Learning of morphology with stochastic transducers
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning probabilistic models of tree edit distance
Pattern Recognition
Learning Balls of Strings with Correction Queries
ECML '07 Proceedings of the 18th European conference on Machine Learning
Learning Metrics Between Tree Structured Data: Application to Image Recognition
ECML '07 Proceedings of the 18th European conference on Machine Learning
Ontology-Driven Approximate Duplicate Elimination of Postal Addresses
IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
SEDiL: Software for Edit Distance Learning
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Melody Recognition with Learned Edit Distances
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
A Stochastic Approach to Median String Computation
SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Learning state machine-based string edit kernels
Pattern Recognition
Finding cognate groups using phylogenies
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Data-driven computational linguistics at FaMAF-UNC, Argentina
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Sequences classification by least general generalisations
ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
Weighted symbols-based edit distance for string-structured image classification
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A system for adaptive information extraction from highly informal text
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Learning good edit similarities with generalization guarantees
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Learning stochastic tree edit distance
ECML'06 Proceedings of the 17th European conference on Machine Learning
A discriminative model of stochastic edit distance in the form of a conditional transducer
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Designing graphical user interfaces integrating gestures
Proceedings of the 30th ACM international conference on Design of communication
DTD based costs for tree-edit distance in structured information retrieval
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Query representation for cross-temporal information retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Deduplicating a places database
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.01 |
Many pattern recognition algorithms are based on the nearest-neighbour search and use the well-known edit distance, for which the primitive edit costs are usually fixed in advance. In this article, we aim at learning an unbiased stochastic edit distance in the form of a finite-state transducer from a corpus of (input, output) pairs of strings. Contrary to the other standard methods, which generally use the Expectation Maximisation algorithm, our algorithm learns a transducer independently on the marginal probability distribution of the input strings. Such an unbiased way to proceed requires to optimise the parameters of a conditional transducer instead of a joint one. We apply our new model in the context of handwritten digit recognition. We show, carrying out a large series of experiments, that it always outperforms the standard edit distance.