Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficiently tracing clusters over high-dimensional on-line data streams
Data & Knowledge Engineering
On High Dimensional Projected Clustering of Uncertain Data Streams
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A Weighted Subspace Clustering Algorithm in High-Dimensional Data Streams
ICICIC '09 Proceedings of the 2009 Fourth International Conference on Innovative Computing, Information and Control
Anomaly intrusion detection by clustering transactional audit streams in a host computer
Information Sciences: an International Journal
A grid-based clustering algorithm for high-dimensional data streams
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Aggregating and disaggregating flexibility objects
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.