External memory algorithms
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A survey on wavelet applications in data mining
ACM SIGKDD Explorations Newsletter
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Systematic data selection to mine concept-drifting data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On demand classification of data streams
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
On Change Diagnosis in Evolving Data Streams
IEEE Transactions on Knowledge and Data Engineering
A Unified Framework for Monitoring Data Streams in Real Time
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Sampling algorithms in a stream operator
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Multi-dimensional regression analysis of time-series data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A regression-based temporal pattern mining scheme for data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate Query Processing in Cube Streams
IEEE Transactions on Knowledge and Data Engineering
Clustering over Multiple Evolving Streams by Events and Correlations
IEEE Transactions on Knowledge and Data Engineering
Neighbor-based pattern detection for windows over streaming data
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Clustering Data Streams in Optimization and Geography Domains
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Stream data clustering based on grid density and attraction
ACM Transactions on Knowledge Discovery from Data (TKDD)
Robust Division in Clustering of Streaming Time Series
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Stream Clustering of Growing Objects
DS '09 Proceedings of the 12th International Conference on Discovery Science
Semi-fuzzy splitting in online divisive-agglomerative clustering
EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
MG-join: detecting phenomena and their correlation in high dimensional data streams
Distributed and Parallel Databases
A framework for clustering categorical time-evolving data
IEEE Transactions on Fuzzy Systems
A clustering algorithm for multiple data streams based on spectral component similarity
Information Sciences: an International Journal
CLUES: a unified framework supporting interactive exploration of density-based clusters in streams
Proceedings of the 20th ACM international conference on Information and knowledge management
Summarization and matching of density-based clusters in streaming environments
Proceedings of the VLDB Endowment
Continuously identifying representatives out of massive streams
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
Mining neighbor-based patterns in data streams
Information Systems
A single pass algorithm for clustering evolving data streams based on swarm intelligence
Data Mining and Knowledge Discovery
A time-dependent enhanced support vector machine for time series regression
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting online social data in ontology learning for event tracking and emergency response
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
An adaptive ensemble classifier for mining concept drifting data streams
Expert Systems with Applications: An International Journal
On clustering large number of data streams
Intelligent Data Analysis
Hi-index | 0.00 |
In the data stream environment, the patterns generated at different time instances are different due to data evolution. As time progresses, the behavior and members of clusters usually change. Hence, clustering continuous data streams allows us to observe the changes of group behavior. In order to support flexible clustering requirements, we devise in this paper a Clustering on Demand framework, abbreviated as COD framework, to dynamically cluster multiple data streams. While providing a general framework of clustering on multiple data streams, the COD framework has two advantageous features, namely, one data scan for online statistics collection and compact multiresolution approximations, which are designed to address, respectively, the time and the space constraints in a data stream environment. The COD framework consists of two phases, i.e., the online maintenance phase and the offline clustering phase. The online maintenance phase provides an efficient mechanism to maintain summary hierarchies of data streams with multiple resolutions in time linear in both the number of streams and the number of data points in each stream. On the other hand, an adaptive clustering algorithm is devised for the offline phase to retrieve approximations of desired substreams from summary hierarchies according to clustering queries. We propose two summarization techniques, based on wavelet and regression analyses, to construct the summary hierarchies. The regression-based summary hierarchy approximates the data stream more precisely and provides better clustering results, at the cost of slightly longer time than and twice the storage space as the wavelet-based one. An adaptive version of COD framework is designed to make a selection between a wavelet-based model and a regression-based model for building the summary hierarchy. By the adaptive COD, we can obtain clustering results with almost the same quality as the regression-based COD while using much less storage space for the summary hierarchy. As shown in the complexity analyses and also validated by our empirical studies, the COD framework performs very efficiently in the data stream environment while producing clustering results of very high quality.