A textual object management system

Authors:
Scott C. Deerwester;Keith Waclena;Michelle LaMar
Affiliations:
The Hong Kong University of Science and Technology, Department of Computer Science and University of Chicago, Center for Information and Language Studies;University of Chicago, Center for Information and Language Studies;University of Chicago, Center for Information and Language Studies
Venue:
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1992

Citing 4
Cited 5

The AWK programming language

The AWK programming language
Programming perl

Programming perl
Programming Techniques: Regular expression search algorithm

Communications of the ACM
ICON Programmng Language

ICON Programmng Language

Text databases: a survey of text models and systems

ACM SIGMOD Record
Evaluation of model-based retrieval effectiveness with OCR text

ACM Transactions on Information Systems (TOIS)
Layered index structures in document database systems

Proceedings of the seventh international conference on Information and knowledge management
Integrating contents and structure in text retrieval

ACM SIGMOD Record
Combining Pat-Trees and Signature Files for Query Evaluation in Document Databases

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computer programs that access significant amounts of text usually include code that manipulates the textual objects that comprise it. Such programs include electronic mail readers, typesetters and, in particular, full-text information retrieval systems. Such code is often unsatisfying in that access to textual objects is either efficient, or flexible, but not both. A programming language like Awk or Perl provides very general facilities for describing textual objects, but at the cost of rescanning the text for every textual object. At the other extreme, full-text information retrieval systems usually offer access to a very limited number of kinds of textual objects, but this access is very efficient. The system described in this paper is a programming tool for managing textual objects. It provides a great deal of flexibility, giving access to very complex document structure, with a large number of constituent kinds of textual objects. Further, it provides access to these objects very efficiently, both in terms of time and auxiliary space, by being very careful to access secondary storage only when absolutely necessary.