A search algorithm and data structure for an efficient information system

Authors:
Shou-chuan Yang
Affiliations:
University of Wisconsin, Madison, Wisconsin
Venue:
COLING '69 Proceedings of the 1969 conference on Computational linguistics
Year:
1969

Citing 6
Cited 2

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Scatter storage techniques

Communications of the ACM
An indirect chaining method for addressing on secondary keys

Communications of the ACM
Programming: An Introduction to Computer Techniques

Programming: An Introduction to Computer Techniques
Programming Languages, Information Structures, and Machine Organization.

Programming Languages, Information Structures, and Machine Organization.
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.

SIMS: an integrated, user-oriented information system

AFIPS '72 (Fall, part II) Proceedings of the December 5-7, 1972, fall joint computer conference, part II
Applying collocation segmentation to the ACL anthology reference corpus

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system for information storage, retrieval, and updating, with special attention to the search algorithm and data structure demanded for maximum program efficieny. The program efficiency is especially warranted when a natural language or a symbolic language is involved in the searching process.The system is a basic framework for an efficient information system. It can be implemented for text processing and document retrieval; numerical data retrieval; and for handling of large files such as dictionaries, catalogs, and personnel records, as well as graphic informations. Currently, eight commands are implemented and operational in batch mode on a CDC 3600: STORE, RETRIEVE, ADD, DELETE, REPLACE, PRINT, COMPRESS and LIST. Further development will be on the use of teletype console, CRT terminal, and plotter under a time-sharing environment for producing immediate responses.The maximum program efficiency is obtained through a unique search algorithm and data structure. Instead of examining the recall ratio and the precision ratio at a higher level, this efficiency is measured in the most basic term of "average number of searches" required for looking up an item. In order to identify an item, at least one search is necessary even if it is found the first time. However, through the use of the hash-address of a key or keyword, in conjunction with an indirect-chaining list-structured table, and a large available space list, the average number of searches required for retrieving a certain item is 1.25 regardless of the size of the file in question. This is to be compared with 15.6 searches for the binary search technique in a 50,000-item file, and 5.8 searches for the letter-table method with no regard to file size.