Online Hierarchical Clustering in a Data Warehouse Environment

  • Authors:
  • Elke Achtert;Christian Bohm;Hans-Peter Kriegel;Peer Kroger

  • Affiliations:
  • University of Munich;University of Munich;University of Munich;University of Munich

  • Venue:
  • ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many important industrial applications rely on data mining methods to uncover patterns and trends in large data warehouse environments. Since a data warehouse is typically updated periodically in a batch mode, the mined patterns have to be updated as well. This requires not only accuracy from data mining methods but also fast availability of up-to-date knowledge, particularly in the presence of a heavy update load. To cope with this problem, we propose the use of online data mining algorithms which permanently store the discovered knowledge in suitable data structures and enable an efficient adaptation of these structures after insertions and deletions on the raw data. In this paper, we demonstrate how hierarchical clustering methods can be reformulated as online algorithms based on the hierarchical clustering method OPTICS, using a density estimator for data grouping. We also discuss how this algorithmic schema can be specialized for efficient online single-link clustering. A broad experimental evaluation demonstrates that the efficiency is superior with significant speed-up factors even for large bulk insertions and deletions.