A relational model for unstructured documents

  • Authors:
  • A. Salminen

  • Affiliations:
  • University of Jyväskylä, Department of Computer Science, Seminaarinkatu 15, SF-40100 Jyväskylä, Finland

  • Venue:
  • SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1987

Quantified Score

Hi-index 0.00

Visualization

Abstract

The logical structure of a document is usually a tree in which the order of the nodes is important at least at some level of the tree. We call a document unstructured if its structure is a single-level ordered tree. The purpose of this paper is to present a many-sorted algebra for handling unstructured documents. The documents in the model are represented by relations. An algebra for handling documents of one type can be extended to an algebra for handling documents of several types. Further, an algebra for handling documents can be extended by the relational algebra for handling documents and relations in a common algebra. The model of this paper can be regarded as a part of a general document model. On the other hand, unstructured documents themselves are an important group of documents. We will show by examples that the simple model covers a wide range of document handling and information retrieval problems.