ACM Transactions on Database Systems (TODS)
On modeling of information retrieval concepts in vector spaces
ACM Transactions on Database Systems (TODS)
Algorithms for clustering data
Algorithms for clustering data
S-tree: a dynamic balanced signature index for office retrieval
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
An approach to multikey sequencing in an equiprobable keyterm retrieval situation
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
On the estimation of the number of desired records with respect to a given query
ACM Transactions on Database Systems (TODS)
ACM Computing Surveys (CSUR)
Analysis and performance of inverted data base structures
Communications of the ACM
Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing
Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Dynamic information and library processing
Dynamic information and library processing
Hi-index | 0.00 |
This work introduces a new approach to record clustering where a hybrid algorithm is presented to cluster records based upon threshold values and the query patterns made to a particular database. The Hamming Distance of a file is used as a measure of space density. The objective of the algorithm is to minimize the Hamming Distance of the file while attaching significance to the most frequent queries being asked. Simulation experiments conducted proved that a great reduction in response time is yielded after the restructuring of a file. We study the space density properties of a file and how it affects retrieval time before and after clustering, as a means of predicting file performance and making appropriate choices of parameters. Criteria, such as, block size, threshold value, percentage of records satisfying a given set of queries, etc., which affect clustering and response time are also studied.