Adaptive Clustering for Multiple Evolving Streams

Authors:
Bi-Ru Dai;Jen-Wei Huang;Mi-Yen Yeh;Ming-Syan Chen
Affiliations:
-;-;-;IEEE
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2006

Citing 19
Cited 21

Computing on data streams

External memory algorithms
Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A survey on wavelet applications in data mining

ACM SIGKDD Explorations Newsletter
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Systematic data selection to mine concept-drifting data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On Change Diagnosis in Evolving Data Streams

IEEE Transactions on Knowledge and Data Engineering
A Unified Framework for Monitoring Data Streams in Real Time

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sampling algorithms in a stream operator

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multi-dimensional regression analysis of time-series data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate Query Processing in Cube Streams

IEEE Transactions on Knowledge and Data Engineering
Clustering over Multiple Evolving Streams by Events and Correlations

IEEE Transactions on Knowledge and Data Engineering
Neighbor-based pattern detection for windows over streaming data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Clustering Data Streams in Optimization and Geography Domains

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Stream data clustering based on grid density and attraction

ACM Transactions on Knowledge Discovery from Data (TKDD)
Robust Division in Clustering of Streaming Time Series

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Stream Clustering of Growing Objects

DS '09 Proceedings of the 12th International Conference on Discovery Science
Semi-fuzzy splitting in online divisive-agglomerative clustering

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
MG-join: detecting phenomena and their correlation in high dimensional data streams

Distributed and Parallel Databases
A framework for clustering categorical time-evolving data

IEEE Transactions on Fuzzy Systems
A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal
CLUES: a unified framework supporting interactive exploration of density-based clusters in streams

Proceedings of the 20th ACM international conference on Information and knowledge management
Summarization and matching of density-based clusters in streaming environments

Proceedings of the VLDB Endowment
Continuously identifying representatives out of massive streams

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Mining neighbor-based patterns in data streams

Information Systems
A single pass algorithm for clustering evolving data streams based on swarm intelligence

Data Mining and Knowledge Discovery
A time-dependent enhanced support vector machine for time series regression

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting online social data in ontology learning for event tracking and emergency response

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
An adaptive ensemble classifier for mining concept drifting data streams

Expert Systems with Applications: An International Journal
On clustering large number of data streams

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the data stream environment, the patterns generated at different time instances are different due to data evolution. As time progresses, the behavior and members of clusters usually change. Hence, clustering continuous data streams allows us to observe the changes of group behavior. In order to support flexible clustering requirements, we devise in this paper a Clustering on Demand framework, abbreviated as COD framework, to dynamically cluster multiple data streams. While providing a general framework of clustering on multiple data streams, the COD framework has two advantageous features, namely, one data scan for online statistics collection and compact multiresolution approximations, which are designed to address, respectively, the time and the space constraints in a data stream environment. The COD framework consists of two phases, i.e., the online maintenance phase and the offline clustering phase. The online maintenance phase provides an efficient mechanism to maintain summary hierarchies of data streams with multiple resolutions in time linear in both the number of streams and the number of data points in each stream. On the other hand, an adaptive clustering algorithm is devised for the offline phase to retrieve approximations of desired substreams from summary hierarchies according to clustering queries. We propose two summarization techniques, based on wavelet and regression analyses, to construct the summary hierarchies. The regression-based summary hierarchy approximates the data stream more precisely and provides better clustering results, at the cost of slightly longer time than and twice the storage space as the wavelet-based one. An adaptive version of COD framework is designed to make a selection between a wavelet-based model and a regression-based model for building the summary hierarchy. By the adaptive COD, we can obtain clustering results with almost the same quality as the regression-based COD while using much less storage space for the summary hierarchy. As shown in the complexity analyses and also validated by our empirical studies, the COD framework performs very efficiently in the data stream environment while producing clustering results of very high quality.