Searching structured documents

  • Authors:
  • Andrew Trotman

  • Affiliations:
  • Department of Computer Science, University of Otago, Dunedin, New Zealand

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus.Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.