On the performance of object clustering techniques

  • Authors:
  • Manolis M. Tsangaris;Jeffrey F. Naughton

  • Affiliations:
  • Department of Computer Sciences, University of Wisconsin Madison;Department of Computer Sciences, University of Wisconsin Madison

  • Venue:
  • SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
  • Year:
  • 1992

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the performance of some of the best-known object clustering algorithms on four different workloads based upon the tektronix benchmark. For all four workloads, stochastic clustering gave the best performance for a variety of performance metrics. Since stochastic clustering is computationally expensive, it is interesting that for every workload there was at least one cheaper clustering algorithm that matched or almost matched stochastic clustering. Unfortunately, for each workload, the algorithm that approximated stochastic clustering was different. Our experiments also demonstrated that even when the workload and object graph are fixed, the choice of the clustering algorithm depends upon the goals of the system. For example, if the goal is to perform well on traversals of small portions of the database starting with a cold cache, the important metric is the per-traversal expansion factor, and a well-chosen placement tree will be nearly optimal; if the goal is to achieve a high steady-state performance with a reasonably large cache, the appropriate metric is the number of pages to which the clustering algorithm maps the active portion of the database. For this metric, the PRP clustering algorithm, which only uses access probabilities achieves nearly optimal performance.