A grid-based clustering algorithm for high-dimensional data streams

Authors:
Yansheng Lu;Yufen Sun;Guiping Xu;Gang Liu
Affiliations:
College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China
Venue:
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Year:
2005

Citing 14
Cited 3

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter
Robot Vision

Robot Vision
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases

The VLDB Journal — The International Journal on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Subspace Selection for Clustering High-Dimensional Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

A clustering algorithm based on matrix over high dimensional data stream

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
A grid-based subspace clustering algorithm for high-dimensional data streams

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Exclusive and complete clustering of streams

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The three main requirements for clustering data streams on-line are one pass over the data, high processing speed, and consuming a small amount of memory. We propose an algorithm that can fulfill these requirements by introducing an incremental grid data structure to summarize the data streams on-line. In order to deal with high-dimensional problems, the algorithm adopts a simple heuristic method to select a subset of dimensions on which all the operations for clustering are performed. Our performance study with a real network intrusion detection stream data set demonstrates the efficiency and effectiveness of our proposed algorithm.