Compression schemes for differential categorical stream clustering

  • Authors:
  • Weiyun Huang;Edward Omiecinski;Leo Mark

  • Affiliations:
  • Georgia Institute of Technology;Georgia Institute of Technology;Georgia Institute of Technology

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stream data analysis differs significantly from traditional data processing. To process the data online the algorithm has to work in one pass, incorporating new data into a model maintained in main memory. Storing a model or synopsis of processed data in the memory, which we call "data compression", is an important technique in both incremental and differential stream mining. This paper proposes several data compression schemes in one-pass categorical data clustering, and demonstrates their performance on synthetic and real data. Our compression schemes can efficiently generate compact representations of original data, so as to enable the algorithm to process streams at high speed and detect the changes in underlying data. The example algorithm based on these compression schemes achieves good accuracy in short execution time.