Parallel free-text search on the connection machine system
Communications of the ACM - Special issue on parallelism
Description and performance analysis of signature file methods for office filing
ACM Transactions on Information Systems (TOIS)
Multikey access methods based on superimposed coding techniques
ACM Transactions on Database Systems (TODS)
Partitioned signature files: design issues and performance evaluation
ACM Transactions on Information Systems (TOIS)
Signature-based text retrieval methods: a survey
Data Engineering
A signature access method for the Starburst database system
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Optimal signature extraction and information loss
ACM Transactions on Database Systems (TODS)
S-tree: a dynamic balanced signature index for office retrieval
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Signature files: an access method for documents and its analytical performance evaluation
ACM Transactions on Information Systems (TOIS)
ACM Transactions on Information Systems (TOIS)
Partial-match retrieval using indexed descriptor files
Communications of the ACM
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Partitioned Signature File Structure for Multiattribute and Text Retrieval
Proceedings of the Sixth International Conference on Data Engineering
A Word-Parallel, Bit-Serial Signature Processor for Superimposed Coding
Proceedings of the Second International Conference on Data Engineering
Analysis of multiterm queries in a dynamic signature file organization
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient Signature File Methods for Text Retrieval
IEEE Transactions on Knowledge and Data Engineering
Hamming Filters: A Dynamic Signature File Organization for Parallel Stores
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Context-aware nearest neighbor query on social networks
SocInfo'11 Proceedings of the Third international conference on Social informatics
The optimal size of a signature
Mathematical and Computer Modelling: An International Journal
Hi-index | 0.00 |
Previous work on superimposed coding has been characterized by two aspects. First, it is generally assumed that signatures are generated from logical text blocks of the same size; that is, each block contains the same number of unique terms after stopword and duplicate removal. We call this approach the fixed-size block (FSB) method, since each text block has the same size, as measured by the number of unique terms contained in it. Second, with only a few exceptions [6,7,8,9,17], most previous work has assumed that each term in the text contributes the same number of ones to the signature (i.e., the weight of the term signatures is fixed). The main objective of this paper is to derive an optimal weight assignment that assigns weights to document terms according to their occurrence and query frequencies in order to minimize the false-drop probability. The optimal scheme can account for both uniform and nonuniform occurence and query frequencies, and the signature generation method is still based on hashing rather than on table lookup. Furthermore, a new way of generating signatures, the fixed-weight block (FWB) method, is introduced. FWB controls the weight of every signature to a constant, whereas in FSB, only the expected signature weight is constant. We have shown that FWB has a lower false-drop probability than that of the FSB method, but its storage overhead is slightly higher. Other advantages of FWB are that the optimal weight assignment can be obtained analytically without making unrealistic assumptions and that the formula for computing the term signature weights is simple and efficient.