Parallel free-text search on the connection machine system
Communications of the ACM - Special issue on parallelism
ACM Transactions on Information Systems (TOIS)
Implementing ranking strategies using text signatures
ACM Transactions on Information Systems (TOIS)
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Partitioned signature files: design issues and performance evaluation
ACM Transactions on Information Systems (TOIS)
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Signature file methods for implementing a ranking strategy
Information Processing and Management: an International Journal
Dynamic partitioning of signature files
ACM Transactions on Information Systems (TOIS)
Overview of the first TREC conference
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Implementations of partial document ranking using inverted files
Information Processing and Management: an International Journal
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
Optimal partial-match retrieval when fields are independently specified
ACM Transactions on Database Systems (TODS)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Efficient Signature File Methods for Text Retrieval
IEEE Transactions on Knowledge and Data Engineering
A Partitioned Signature File Structure for Multiattribute and Text Retrieval
Proceedings of the Sixth International Conference on Data Engineering
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
CCSC '01 Proceedings of the sixth annual CCSC northeastern conference on The journal of computing in small colleges
Operational requirements for scalable search systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Comparing inverted files and signature files for searching a large lexicon
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Hi-index | 0.00 |
A signature file organization, called the weight-partitioned signature file, for supporting document ranking is proposed. It employs multiple signature files, each of which corresponds to one term frequency, to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hashed into the signature file corresponding to that term frequency. This eliminates the need to record the term frequency explicitly for each word. We investigate the effect of false drops on retrieval effectiveness if they are not eliminated in the search process. We have shown that false drops introduce insignificant degradation on precision and recall when the false-drop probability is below a certain threshold. This is an important result since false-drop elimination could become the bottleneck in systems using fast signature file search techniques. We perform an analytical study on the performance of the weight-partitioned signature file under different search strategies and configurations. An optimal formula is obtained to determine for a fixed total storage overhead the storage to be allocated to each partition in order to minimize the effect of false drops on document ranks. Experiments were performed using a document collection to support the analytical results.