CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams
Data & Knowledge Engineering
Grid-based subspace clustering over data streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficiently tracing clusters over high-dimensional on-line data streams
Data & Knowledge Engineering
Clustering data stream: A survey of algorithms
International Journal of Knowledge-based and Intelligent Engineering Systems
Anomaly intrusion detection by clustering transactional audit streams in a host computer
Information Sciences: an International Journal
Approximate trace of grid-based clusters over high dimensional data streams
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A neighborhood density estimation clustering algorithm based on minimum spanning tree
RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Clustering distributed sensor data streams using local processing and reduced communication
Intelligent Data Analysis - Ubiquitous Knowledge Discovery
A clustering algorithm for multiple data streams based on spectral component similarity
Information Sciences: an International Journal
Anomaly intrusion detection based on clustering a data stream
ISC'06 Proceedings of the 9th international conference on Information Security
SIC-means: a semi-fuzzy approach for clustering data streams using c-means
ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
An incremental data stream clustering algorithm based on dense units detection
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A grid-based subspace clustering algorithm for high-dimensional data streams
WISE'06 Proceedings of the 7th international conference on Web Information Systems
On pre-processing algorithms for data stream
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Aggregating and disaggregating flexibility objects
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This paper proposes a statistical grid-based approach to clustering data elements of a data stream. Initially, the multidimensional data space of a data stream is partitioned into a set of mutually exclusive equal-size initial cells. When the support of a cell becomes high enough, the cell is dynamically divided into two mutually exclusive intermediate cells based on its distribution statistics. Three different ways of partitioning a dense cell are introduced. Eventually, a dense region of each initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. A cluster of a data stream is a group of adjacent dense unit cells. In order to minimize the number of cells, a sparse intermediate or unit cell is pruned if its support becomes much less than a minimum support. Furthermore, in order to confine the usage of memory space, the size of a unit cell is dynamically minimized such that the result of clustering becomes as accurate as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics.