Next-generation information retrieval: integrating document and data retrieval based on xml

  • Authors:
  • Michael Gertz;Jan-Marco Bremer

  • Affiliations:
  • -;-

  • Venue:
  • Next-generation information retrieval: integrating document and data retrieval based on xml
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data retrieval comprises exact queries that allow a user to specify a precisely defined subset of a data source. Document retrieval arranges elements of a given document collection according to their relevance to a set of query terms. For querying structured and semistructured data, data retrieval and document retrieval are two valuable and complementary techniques that, however, have never been fully integrated. In this dissertation, we introduce Integrated Information Retrieval (IIR), a conceptually new retrieval approach that closes this gap. We present syntax and semantics of an extension of the XQuery language called XQuery/IR. The extended language realizes IIR based on the Extensible Markup Language (XML) and allows users to formulate new kinds of valuable queries by nesting ranked document retrieval and precise data retrieval sub-queries. Furthermore, we detail index structures and efficient query processing approaches for implementing XQuery/IR. Based on a new identification scheme for nodes in a node-labeled tree structure such as underlying XML, the index structures require only a fraction of the space of comparable, existing index structures for just data retrieval. For semistructured data such as XML data, we also present a first distribution design approach, whose realization confirms the value of the new node identification and indexing scheme for applications beyond Integrated Information Retrieval.