Density-based clustering of data streams at multiple resolutions

Authors:
Li Wan;Wee Keong Ng;Xuan Hong Dang;Philip S. Yu;Kuan Zhang
Affiliations:
Nanyang Technological Unviserity;Nanyang Technological Unviserity;Institute of Infocomm Research, Singapore;University of Illinios at Chicago;Singapore Management University
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2009

Citing 13
Cited 5

STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering on Demand for Multiple Data Streams

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
An Efficient Density Based Clustering Algorithm for Large Databases

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Density-based clustering of uncertain data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Suppressing model overfitting in mining concept-drifting data streams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A local-density based spatial clustering algorithm with noise

Information Systems
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Online and offline trend cluster discovery in spatially distributed data streams

MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data
SIC-means: a semi-fuzzy approach for clustering data streams using c-means

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
SOStream: self organizing density-based clustering over data stream

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
A weightless neural network-based approach for stream data clustering

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
A single pass algorithm for clustering evolving data streams based on swarm intelligence

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data stream clustering, it is desirable to have algorithms that are able to detect clusters of arbitrary shape, clusters that evolve over time, and clusters with noise. Existing stream data clustering algorithms are generally based on an online-offline approach: The online component captures synopsis information from the data stream (thus, overcoming real-time and memory constraints) and the offline component generates clusters using the stored synopsis. The online-offline approach affects the overall performance of stream data clustering in various ways: the ease of deriving synopsis from streaming data; the complexity of data structure for storing and managing synopsis; and the frequency at which the offline component is used to generate clusters. In this article, we propose an algorithm that (1) computes and updates synopsis information in constant time; (2) allows users to discover clusters at multiple resolutions; (3) determines the right time for users to generate clusters from the synopsis information; (4) generates clusters of higher purity than existing algorithms; and (5) determines the right threshold function for density-based clustering based on the fading model of stream data. To the best of our knowledge, no existing data stream algorithms has all of these features. Experimental results show that our algorithm is able to detect arbitrarily shaped, evolving clusters with high quality.