A New Approach to Clustering Records in Information Retrieval Systems

Authors:
I. A. R. Moghrabi;R. A. Makholian
Affiliations:
Natural Science Division, Lebanese American University, P.O. Box 13-5053, Beirut, Lebanon. imoghrbi@lau.edu.lb;Natural Science Division, Lebanese American University, P.O. Box 13-5053, Beirut, Lebanon
Venue:
Information Retrieval
Year:
2000

Citing 13
Cited 0

Adaptive record clustering

ACM Transactions on Database Systems (TODS)
On modeling of information retrieval concepts in vector spaces

ACM Transactions on Database Systems (TODS)
Algorithms for clustering data

Algorithms for clustering data
S-tree: a dynamic balanced signature index for office retrieval

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
An approach to multikey sequencing in an equiprobable keyterm retrieval situation

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
On the estimation of the number of desired records with respect to a given query

ACM Transactions on Database Systems (TODS)
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Analysis and performance of inverted data base structures

Communications of the ACM
Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing

Conceptual Information Retrieval: A Case Study in Adaptive Partial Parsing
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Dynamic information and library processing

Dynamic information and library processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work introduces a new approach to record clustering where a hybrid algorithm is presented to cluster records based upon threshold values and the query patterns made to a particular database. The Hamming Distance of a file is used as a measure of space density. The objective of the algorithm is to minimize the Hamming Distance of the file while attaching significance to the most frequent queries being asked. Simulation experiments conducted proved that a great reduction in response time is yielded after the restructuring of a file. We study the space density properties of a file and how it affects retrieval time before and after clustering, as a means of predicting file performance and making appropriate choices of parameters. Criteria, such as, block size, threshold value, percentage of records satisfying a given set of queries, etc., which affect clustering and response time are also studied.