Compression schemes for differential categorical stream clustering

Authors:
Weiyun Huang;Edward Omiecinski;Leo Mark
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 2
Cited 1

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management

A Cluster-Based Context-Tree Model for Multivariate Data Streams with Applications to Anomaly Detection

INFORMS Journal on Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream data analysis differs significantly from traditional data processing. To process the data online the algorithm has to work in one pass, incorporating new data into a model maintained in main memory. Storing a model or synopsis of processed data in the memory, which we call "data compression", is an important technique in both incremental and differential stream mining. This paper proposes several data compression schemes in one-pass categorical data clustering, and demonstrates their performance on synthetic and real data. Our compression schemes can efficiently generate compact representations of original data, so as to enable the algorithm to process streams at high speed and detect the changes in underlying data. The example algorithm based on these compression schemes achieves good accuracy in short execution time.