Signature files: design and performance comparison of some signature extraction methods
SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
On extending the functions of a relational database system
SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
A Multimedia Office Filing System
VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Group Updates for Red-Black Trees
CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
Transparent Distributed Web Caching
LCN '01 Proceedings of the 26th Annual IEEE Conference on Local Computer Networks
On the SD-tree construction for optimal signature operations
COMPUTE '08 Proceedings of the 1st Bangalore Annual Compute Conference
Parallel high-dimensional index structure using cell-based filtering for multimedia data
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
P-CBF: a parallel cell-based filtering scheme using a horizontal partitioning technique
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.00 |
In this paper we study a variation of the signature file access method for text and attribute retrieval. According to this method, the documents (or records) are stored sequentially in the "text file". Abstractions ("signatures") of the documents (or records) are stored in the "signature file". The latter serves as a filter on retrieval: It helps discarding a large number of nonqualifying documents. We pro-pose a signature extraction method that takes into account the query and occurrence frequencies, thus achieving better performance. The model we present is general enough, so that results can be applied not only for text retrieval but also for files with formatted data.