Flexible document-query matching based on a probabilistic content and structure score combination

  • Authors:
  • Mohamed Ben Aouicha;Mohamed Tmar;Mohand Boughanem

  • Affiliations:
  • Université Paul Sabatier, Toulouse, France;Institut Supérieur d'Informatique et du Multimédia de Sfax, Sfax, Tunisia;Université Paul Sabatier, Toulouse, France

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of an XML retrieval system is to select from a set of XML documents all elements (nodes) that fit the user information need, usually expressed by a set of keywords with some structural conditions. Structural conditions are simply given by an ordered list of tag names that gives the target element where to search for relevant content. Consequently a potential relevant node should not only contain similar text to the query but also its localization path should fit the structural conditions. We describe in this paper a new approach for ranking XML content-and-structure queries based on a probabilistic combination of two independent scores assigned to each XML element: content score and structural score. Content score measures the content similarity between an element and a query, the structural score measures the path similarity between an element path and the structural conditions of a query. We showed experimentally that both scores follow well-known distributions. We then proposed a probabilistic combination of these distributions in order to assign a final score to each node. Some experiments have been undertaken on a dataset provided by INEX to show the effectiveness of our approach. We emphasize our experiments on the VVCAS task which is appropriate to our model.