DTD based costs for tree-edit distance in structured information retrieval

Authors:
Cyril Laitang;Karen Pinel-Sauvagnat;Mohand Boughanem
Affiliations:
IRIT-SIG, Toulouse Cedex 9, France;IRIT-SIG, Toulouse Cedex 9, France;IRIT-SIG, Toulouse Cedex 9, France
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 19
Cited 1

The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
Algorithm 97: Shortest path

Communications of the ACM
Probabilistic models of information retrieval based on measuring the divergence from randomness

ACM Transactions on Information Systems (TOIS)
Fuzzy Techniques for XML Data Smushing

Proceedings of the International Conference, 7th Fuzzy Days on Computational Intelligence, Theory and Applications
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Computing the Edit-Distance between Unrooted Ordered Trees

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
A survey on tree edit distance and related problems

Theoretical Computer Science
Why structural hints in queries do not help XML-retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning stochastic edit distance: Application in handwritten character recognition

Pattern Recognition
INEX 2007 Evaluation Measures

Focused Access to XML Documents
Automatic cost estimation for tree edit distance using particle swarm optimization

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Analysis of tree edit distance algorithms

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Flexible document-query matching based on a probabilistic content and structure score combination

Proceedings of the 2010 ACM Symposium on Applied Computing
Overview of the INEX 2010 data centric track

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
UPF at INEX 2010: towards query-type based focused retrieval

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
University of Otago at INEX 2010

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
INEX 2005 evaluation measures

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SIRIUS: a lightweight XML indexing and approximate search system at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Why using structural hints in XML retrieval?

FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems

Estimating structural relevance of XML elements through language model

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a Structured Information Retrieval (SIR) model based on graph matching. Our approach combines content propagation, which handles sibling relationships, with a document-query structure matching process. The latter is based on Tree-Edit Distance (TED) which is the minimum set of insert, delete, and replace operations to turn one tree to another. To our knowledge this algorithm has never been used in ad-hoc SIR. As the effectiveness of TED relies both on the input tree and the edit costs, we first present a focused subtree extraction technique which selects the most representative elements of the document w.r.t the query. We then describe our TED costs setting based on the Document Type Definition (DTD). Finally we discuss our results according to the type of the collection (data-oriented or text-oriented). Experiments are conducted on two INEX test sets: the 2010 Datacentric collection and the 2005 Ad-hoc one.