Clustering data stream: A survey of algorithms

  • Authors:
  • Alireza Rezaei Mahdiraji

  • Affiliations:
  • Multimedia University, Cyberjaya, Malaysia. E-mail: alireza.rezaei.mah07@mmu.edu.my

  • Venue:
  • International Journal of Knowledge-based and Intelligent Engineering Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data stream is a massive, continuous and rapid sequence of data elements. The data stream model requires algorithms to make a single pass over the data, with bounded memory and limited processing time, whereas the stream may be highly dynamic and evolving over time. Mining data streams is a real time process of extracting interesting patterns from high-speed data streams. Mining data streams raises new problems for the data mining community in terms of how to mine continuous high-speed data items that you can only have one look at. Clustering, useful tool in data mining, is the process of finding groups of similar data elements which are defined by a given similarity measure. Over the past few years, a number of clustering algorithms for data stream have been put forth. In this paper, we survey five different algorithms for clustering data stream. These algorithms consist divide and conquer, doubling, statistical grid-based, STREAM and CluStream. We compare these algorithms based on several different characters.