Stream data clustering based on grid density and attraction

Authors:
Li Tu;Yixin Chen
Affiliations:
Nanjing University of Aeronautics and Astronautics, Nanjing, China;Washington University in St. Louis, St. Louis, MO
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2009

Citing 18
Cited 4

Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Issues in data stream management

ACM SIGMOD Record
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
CDS-Tree: An Effective Index for Clustering Arbitrary Shapes in Data Streams

RIDE '05 Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications
Intrusion Detection based on Clustering a Data Stream

SERA '05 Proceedings of the Third ACIS Int'l Conference on Software Engineering Research, Management and Applications
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
2005 Special Issue: Efficient streaming text clustering

Neural Networks - 2005 Special issue: IJCNN 2005
A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Suppressing model overfitting in mining concept-drifting data streams

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online clustering of parallel data streams

Data & Knowledge Engineering
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

SIC-means: a semi-fuzzy approach for clustering data streams using c-means

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
A density-based clustering structure mining algorithm for data streams

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
A single pass algorithm for clustering evolving data streams based on swarm intelligence

Data Mining and Knowledge Discovery
Mining frequent items in data stream using time fading model

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering real-time stream data is an important and challenging problem. Existing algorithms such as CluStream are based on the k-means algorithm. These clustering algorithms have difficulties finding clusters of arbitrary shapes and handling outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this article proposes D-Stream, a framework for clustering stream data using a density-based approach. Our algorithm uses an online component that maps each input data record into a grid and an offline component that computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream and a attraction-based mechanism to accurately generate cluster boundaries. Exploiting the intricate relationships among the decay factor, attraction, data density, and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.