A New Approach to Clustering Records in Information Retrieval Systems

  • Authors:
  • I. A. R. Moghrabi;R. A. Makholian

  • Affiliations:
  • Natural Science Division, Lebanese American University, P.O. Box 13-5053, Beirut, Lebanon. imoghrbi@lau.edu.lb;Natural Science Division, Lebanese American University, P.O. Box 13-5053, Beirut, Lebanon

  • Venue:
  • Information Retrieval
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work introduces a new approach to record clustering where a hybrid algorithm is presented to cluster records based upon threshold values and the query patterns made to a particular database. The Hamming Distance of a file is used as a measure of space density. The objective of the algorithm is to minimize the Hamming Distance of the file while attaching significance to the most frequent queries being asked. Simulation experiments conducted proved that a great reduction in response time is yielded after the restructuring of a file. We study the space density properties of a file and how it affects retrieval time before and after clustering, as a means of predicting file performance and making appropriate choices of parameters. Criteria, such as, block size, threshold value, percentage of records satisfying a given set of queries, etc., which affect clustering and response time are also studied.