SOStream: self organizing density-based clustering over data stream

  • Authors:
  • Charlie Isaksson;Margaret H. Dunham;Michael Hahsler

  • Affiliations:
  • Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas;Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas;Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose a data stream clustering algorithm, called Self Organizing density based clustering over data Stream (SOStream). This algorithm has several novel features. Instead of using a fixed, user defined similarity threshold or a static grid, SOStream detects structure within fast evolving data streams by automatically adapting the threshold for density-based clustering. It also employs a novel cluster updating strategy which is inspired by competitive learning techniques developed for Self Organizing Maps (SOMs). In addition, SOStream has built-in online functionality to support advanced stream clustering operations including merging and fading. This makes SOStream completely online with no separate offline components. Experiments performed on KDD Cup'99 and artificial datasets indicate that SOStream is an effective and superior algorithm in creating clusters of higher purity while having lower space and time requirements compared to previous stream clustering algorithms.