The AWK programming language
Programming perl
Programming Techniques: Regular expression search algorithm
Communications of the ACM
ICON Programmng Language
Text databases: a survey of text models and systems
ACM SIGMOD Record
Evaluation of model-based retrieval effectiveness with OCR text
ACM Transactions on Information Systems (TOIS)
Layered index structures in document database systems
Proceedings of the seventh international conference on Information and knowledge management
Integrating contents and structure in text retrieval
ACM SIGMOD Record
Combining Pat-Trees and Signature Files for Query Evaluation in Document Databases
DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Computer programs that access significant amounts of text usually include code that manipulates the textual objects that comprise it. Such programs include electronic mail readers, typesetters and, in particular, full-text information retrieval systems. Such code is often unsatisfying in that access to textual objects is either efficient, or flexible, but not both. A programming language like Awk or Perl provides very general facilities for describing textual objects, but at the cost of rescanning the text for every textual object. At the other extreme, full-text information retrieval systems usually offer access to a very limited number of kinds of textual objects, but this access is very efficient. The system described in this paper is a programming tool for managing textual objects. It provides a great deal of flexibility, giving access to very complex document structure, with a large number of constituent kinds of textual objects. Further, it provides access to these objects very efficiently, both in terms of time and auxiliary space, by being very careful to access secondary storage only when absolutely necessary.