On Harrison's substring testing technique
Communications of the ACM
Implementation of the substring test by hashing
Communications of the ACM
ACM Computing Surveys (CSUR) - Annals of discrete mathematics, 24
A new character-based indexing method using frequency data for Japanese documents
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
One-time complete indexing of text: theory and practice
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Recursive hashing functions for n-grams
ACM Transactions on Information Systems (TOIS)
Data Structures for an Integrated Data Base Management and Information Retrieval System
VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases
Hi-index | 48.22 |
Using direct access computer files of bibliographic information, an attempt is made to overcome one of the problems often associated with information retrieval, namely, the maintenance and use of large dictionaries, the greater part of which is used only infrequently. A novel method is presented, which maps the hyperbolic frequency distribution of text characteristics onto a rectangular distribution. This is more suited to implementation on storage devices.This method treats text as a string of characters rather than words bounded by spaces, and chooses subsets of strings such that their frequencies of occurrence are more even than those of word types. The members of this subset are then used as index keys for retrieval. The rectangular distribution of key frequencies results in a much simplified file organization and promises considerable cost advantages.