Flexible document-query matching based on a probabilistic content and structure score combination

Authors:
Mohamed Ben Aouicha;Mohamed Tmar;Mohand Boughanem
Affiliations:
Université Paul Sabatier, Toulouse, France;Institut Supérieur d'Informatique et du Multimédia de Sfax, Sfax, Tunisia;Université Paul Sabatier, Toulouse, France
Venue:
Proceedings of the 2010 ACM Symposium on Applied Computing
Year:
2010

Citing 14
Cited 2

Structured answers for a large structured document collection

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Dempster-Shafer's theory of evidence applied to structured documents: modelling uncertainty

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Querying and ranking XML documents

Journal of the American Society for Information Science and Technology - XML
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Searching and Browsing Collections of Structural Information

ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
XML retrieval: what to retrieve?

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A Fusion Approach to XML Structured Document Retrieval

Information Retrieval
Generalized contextualization method for XML information retrieval

Proceedings of the 14th ACM international conference on Information and knowledge management
XQuery full-text extensions explained

IBM Systems Journal
eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval

ACM Transactions on Information Systems (TOIS)
TopX: efficient and versatile top-k query processing for semistructured data

The VLDB Journal — The International Journal on Very Large Data Bases
Parameter estimation for a simple hierarchical generative model for XML retrieval

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SIRIUS: a lightweight XML indexing and approximate search system at INEX 2005

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval

XML information retrieval through tree edit distance and structural summaries

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
DTD based costs for tree-edit distance in structured information retrieval

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of an XML retrieval system is to select from a set of XML documents all elements (nodes) that fit the user information need, usually expressed by a set of keywords with some structural conditions. Structural conditions are simply given by an ordered list of tag names that gives the target element where to search for relevant content. Consequently a potential relevant node should not only contain similar text to the query but also its localization path should fit the structural conditions. We describe in this paper a new approach for ranking XML content-and-structure queries based on a probabilistic combination of two independent scores assigned to each XML element: content score and structural score. Content score measures the content similarity between an element and a query, the structural score measures the path similarity between an element path and the structural conditions of a query. We showed experimentally that both scores follow well-known distributions. We then proposed a probabilistic combination of these distributions in order to assign a final score to each node. Some experiments have been undertaken on a dataset provided by INEX to show the effectiveness of our approach. We emphasize our experiments on the VVCAS task which is appropriate to our model.