A survey on XML focussed component retrieval

  • Authors:
  • Karen Pinel-Sauvagnat;Mohand Boughanem

  • Affiliations:
  • IRIT-SIG/RFI, Toulouse Cedex;IRIT-SIG/RFI, Toulouse Cedex

  • Venue:
  • Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Focussed XML component retrieval is one of the most important challenges in the XML IR field. The aim of the focussed retrieval strategy is to find the most exhaustive and specific element in a path, i.e. to retrieve elements that focus on the user need, without nested elements. In this paper, we introduce a relevance propagation method dealing with focussed XML component retrieval. Many experiments are carried out with the INEX 2005 test suite to define what are the main characteristics of relevant elements in focussed retrieval and to compare such characteristics with those of relevant elements in thorough retrieval (where the aim is to find all relevant elements in the collection). Our main findings are the following. First, a term weighting scheme taking into account the importance of terms in elements and both in collection of elements and collection of documents is useful. Moreover, the introduction of component length as a threshold on results or used in a weighted propagation function improves significantly the results. Third, contextual relevance seems not to be useful, which contradicts results obtained by state-of-the-art methods for non-focussed retrieval. At last, the use of structural hints increases up to 50% performances we obtained when using queries composed only of simple keyword terms.