Online clustering of parallel data streams

  • Authors:
  • Jürgen Beringer;Eyke Hüllermeier

  • Affiliations:
  • Fakultät für Informatik, Otto-von-Guericke-Universität, Magdeburg, Germany;Fakultät für Informatik, Otto-von-Guericke-Universität, Magdeburg, Germany

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel streams of real-valued data, that is to say, continuously evolving time series. In other words, we are interested in grouping data streams the evolution over time of which is similar in a specific sense. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating not more than a constant time delay. For this purpose, we develop an efficient online version of the classical K-means clustering algorithm. Our method's efficiency is mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams.