Data stream clustering: A survey

Authors:
Jonathan A. Silva;Elaine R. Faria;Rodrigo C. Barros;Eduardo R. Hruschka;André C. P. L. F. de Carvalho;João Gama
Affiliations:
University of São Paulo, São Paulo, Brazil;University of São Paulo and Federal University of Uberlândia, Brazil;University of São Paulo, São Paulo, Brazil;University of São Paulo, São Paulo, Brazil;University of São Paulo, Brazil;University of Porto, Portugal
Venue:
ACM Computing Surveys (CSUR)
Year:
2013

Citing 77
Cited 1

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Multidimensional binary search trees used for associative searching

Communications of the ACM
Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Online Facility Location

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Approximating extent measures of points

Journal of the ACM (JACM)
Extensible Markov Model

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Duplicate detection in click streams

WWW '05 Proceedings of the 14th international conference on World Wide Web
Research issues in data stream association rule mining

ACM SIGMOD Record
How slow is the k-means method?

Proceedings of the twenty-second annual symposium on Computational geometry
Discretization from data streams: applications to histograms and data mining

Proceedings of the 2006 ACM symposium on Applied computing
MONIC: modeling and monitoring cluster transitions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic Smoothing for Model-based Document Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
Introduction to Clustering Large and High-Dimensional Data

Introduction to Clustering Large and High-Dimensional Data
Unsupervised Clustering In Streaming Data

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
HClustream: A Novel Approach for Clustering Evolving Heterogeneous Data Stream

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams

Data & Knowledge Engineering
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top 10 algorithms in data mining

Knowledge and Information Systems
Learning from Data Streams: Processing Techniques in Sensor Networks

Learning from Data Streams: Processing Techniques in Sensor Networks
Tracking clusters in evolving data streams over sliding windows

Knowledge and Information Systems
Hierarchical Clustering of Time-Series Data Streams

IEEE Transactions on Knowledge and Data Engineering
Continuous Trend-Based Clustering in Data Streams

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A Weighted Fuzzy Clustering Algorithm for Data Stream

CCCM '08 Proceedings of the 2008 ISECS International Colloquium on Computing, Communication, Control, and Management - Volume 01
Incremental clustering of dynamic data streams using connectivity based representative points

Data & Knowledge Engineering
Clustering

Clustering
Tight results for clustering and summarizing data streams

Proceedings of the 12th International Conference on Database Theory
An EM-Based Algorithm for Clustering Data Streams in Sliding Windows

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
k-means requires exponentially many iterations even in the plane

Proceedings of the twenty-fifth annual symposium on Computational geometry
A Framework for Clustering Uncertain Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Multi-scale Real-Time Grid Monitoring with Job Stream Mining

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Clustering data stream: A survey of algorithms

International Journal of Knowledge-based and Intelligent Engineering Systems
Density-Based Data Streams Clustering over Sliding Windows

FSKD '09 Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 05
Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams
A detailed analysis of the KDD CUP 99 data set

CISDA'09 Proceedings of the Second IEEE international conference on Computational intelligence for security and defense applications
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Clustering of Evolving Data Stream with Multiple Adaptive Sliding Window

DSDE '10 Proceedings of the 2010 International Conference on Data Storage and Data Engineering
Knowledge Discovery from Data Streams

Knowledge Discovery from Data Streams
MOA: Massive Online Analysis

The Journal of Machine Learning Research
Clustering distributed sensor data streams using local processing and reduced communication

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
MEC --Monitoring Clusters' Transitions

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Self-adaptive change detection in streaming data with non-stationary distribution

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
An effective evaluation measure for clustering on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
The ClusTree: indexing micro-clusters for anytime stream mining

Knowledge and Information Systems
A segment-based framework for modeling and mining data streams

Knowledge and Information Systems
DCF: an efficient data stream clustering framework for streaming applications

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Proceedings of the Second international conference on Knowledge Discovery from Sensor Data

Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
StreamKM++: A clustering algorithm for data streams

Journal of Experimental Algorithmics (JEA)
Continuously identifying representatives out of massive streams

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Evidential evolving Gustafson--Kessel algorithm for online data streams partitioning using belief function theory

International Journal of Approximate Reasoning
Improving the offline clustering stage of data stream algorithms in scenarios with variable number of clusters

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Least squares quantization in PCM

IEEE Transactions on Information Theory
A clustering approach for sampling data streams in sensor networks

Knowledge and Information Systems
Extending k-Means-Based Algorithms for Evolving Data Streams with Variable Number of Clusters

ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 02
SOStream: self organizing density-based clustering over data stream

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition

Light-weight Online Predictive Data Aggregation for Wireless Sensor Networks

Proceedings of Workshop on Machine Learning for Sensory Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.