Clustering data stream: A survey of algorithms

Authors:
Alireza Rezaei Mahdiraji
Affiliations:
Multimedia University, Cyberjaya, Malaysia. E-mail: alireza.rezaei.mah07@mmu.edu.my
Venue:
International Journal of Knowledge-based and Intelligent Engineering Systems
Year:
2009

Citing 12
Cited 3

Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Statistical grid-based clustering over data streams

ACM SIGMOD Record
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Mining data streams: a review

ACM SIGMOD Record
HClustream: A Novel Approach for Clustering Evolving Heterogeneous Data Stream

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Grid-based subspace clustering over data streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Summarizing a document stream

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
CAMEUD: clustering approach for mining evolving usage data

Proceedings of the Ninth International Workshop on Information Integration on the Web
Data stream clustering: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A data stream is a massive, continuous and rapid sequence of data elements. The data stream model requires algorithms to make a single pass over the data, with bounded memory and limited processing time, whereas the stream may be highly dynamic and evolving over time. Mining data streams is a real time process of extracting interesting patterns from high-speed data streams. Mining data streams raises new problems for the data mining community in terms of how to mine continuous high-speed data items that you can only have one look at. Clustering, useful tool in data mining, is the process of finding groups of similar data elements which are defined by a given similarity measure. Over the past few years, a number of clustering algorithms for data stream have been put forth. In this paper, we survey five different algorithms for clustering data stream. These algorithms consist divide and conquer, doubling, statistical grid-based, STREAM and CluStream. We compare these algorithms based on several different characters.