Learning probabilistic models of tree edit distance

Authors:
Marc Bernard;Laurent Boyer;Amaury Habrard;Marc Sebban
Affiliations:
Laboratoire Hubert Curien, Université de Saint-Etienne, 18 rue du Professeur Lauras, 42000 Saint-Etienne, France;Laboratoire Hubert Curien, Université de Saint-Etienne, 18 rue du Professeur Lauras, 42000 Saint-Etienne, France;Laboratoire d'Informatique Fondamentale, Université de Provence, 39 rue Frédéric Joliot Curie, 13453 Marseille cedex 13, France;Laboratoire Hubert Curien, Université de Saint-Etienne, 18 rue du Professeur Lauras, 42000 Saint-Etienne, France
Venue:
Pattern Recognition
Year:
2008

Citing 8
Cited 10

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparison of fast nearest neighbour classifiers for handwritten character recognition

Pattern Recognition Letters
RNA Secondary structure comparison: exact analysis of the Zhang--Shasha tree edit algorithm

Theoretical Computer Science
A Probabilistic Approach to Learning Costs for Graph Edit Distance

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
A survey on tree edit distance and related problems

Theoretical Computer Science
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition
Learning stochastic tree edit distance

ECML'06 Proceedings of the 17th European conference on Machine Learning

SEDiL: Software for Edit Distance Learning

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Efficient change control of XML documents

Proceedings of the 9th ACM symposium on Document engineering
Automatic cost estimation for tree edit distance using particle swarm optimization

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Optimizing textual entailment recognition using particle swarm optimization

TextInfer '09 Proceedings of the 2009 Workshop on Applied Textual Inference
Learning state machine-based string edit kernels

Pattern Recognition
Tree edit models for recognizing textual entailments, paraphrases, and answers to questions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Probabilistic tree-edit models with structured latent variables for textual entailment and question answering

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Learning good edit similarities with generalization guarantees

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
On the usefulness of similarity based projection spaces for transfer learning

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Probabilistic finite state machines for regression-based MT evaluation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Nowadays, there is a growing interest in machine learning and pattern recognition for tree-structured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semi-structured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non-parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackle these tasks, we present an adaptation of the expectation-maximization algorithm for learning these distributions over the primitive edit costs. Two experiments are conducted. The first is achieved on artificial data and confirms the interest to learn a tree ED rather than a priori imposing edit costs; The second is applied to a pattern recognition task aiming to classify handwritten digits.