Information Retrieval and Structured Documents

  • Authors:
  • Yves Chiaramella

  • Affiliations:
  • -

  • Venue:
  • ESSIR '00 Proceedings of the Third European Summer-School on Lectures on Information Retrieval-Revised Lectures
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Standard Information Retrieval considers documents as atomic units of information that are indexed and retrieved as a whole. Modern evolution of document design and storage have since a long time introduced more elaborate representations of documents; standards such as SGML, then HTML and now XML are of course major contributions in this domain. These standards underly today evolutions towards modern electronic documents. In this context, retrieving structured documents refers to index and retrieve information according to a given structure of documents. This means that documents are no longer considered as atomic entities, but as aggregates of interrelated objects that can be retrieved separately: given a retrieval query, one may retrieve the set of document components that are most relevant to this query.In this chapter we shall first emphasise some aspects which, in our opinion, relate explicit use of document structure to interactive retrieval performances, such as efficiency while browsing or querying information. In a second step we shall investigate two classes of implementation approaches dealing with indexing and retrieving structured documents: passage retrieval and explicit use of hierarchical structures of documents.