A textual object management system

  • Authors:
  • Scott C. Deerwester;Keith Waclena;Michelle LaMar

  • Affiliations:
  • The Hong Kong University of Science and Technology, Department of Computer Science and University of Chicago, Center for Information and Language Studies;University of Chicago, Center for Information and Language Studies;University of Chicago, Center for Information and Language Studies

  • Venue:
  • SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computer programs that access significant amounts of text usually include code that manipulates the textual objects that comprise it. Such programs include electronic mail readers, typesetters and, in particular, full-text information retrieval systems. Such code is often unsatisfying in that access to textual objects is either efficient, or flexible, but not both. A programming language like Awk or Perl provides very general facilities for describing textual objects, but at the cost of rescanning the text for every textual object. At the other extreme, full-text information retrieval systems usually offer access to a very limited number of kinds of textual objects, but this access is very efficient. The system described in this paper is a programming tool for managing textual objects. It provides a great deal of flexibility, giving access to very complex document structure, with a large number of constituent kinds of textual objects. Further, it provides access to these objects very efficiently, both in terms of time and auxiliary space, by being very careful to access secondary storage only when absolutely necessary.