Cost-efficient mining techniques for data streams

  • Authors:
  • Mohamed Medhat Gaber;Shonali Krishnaswamy;Arkady Zaslavsky

  • Affiliations:
  • Monash University, Caulfield East, VIC, Australia;Monash University, Caulfield East, VIC, Australia;Monash University, Caulfield East, VIC, Australia

  • Venue:
  • ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data stream is a continuous and high-speed flow of data items. High speed refers to the phenomenon that the data rate is high relative to the computational power. The increasing focus of applications that generate and receive data streams stimulates the need for online data stream analysis tools. Mining data streams is a real time process of extracting interesting patterns from high-speed data streams. Mining data streams raises new problems for the data mining community in terms of how to mine continuous high-speed data items that you can only have one look at. In this paper, we propose algorithm output granularity as a solution for mining data streams. Algorithm output granularity is the amount of mining results that fits in main memory before any incremental integration. We show the application of the proposed strategy to build efficient clustering, frequent items and classification techniques. The empirical results for our clustering algorithm are presented and discussed which demonstrate acceptable accuracy coupled with efficiency in running time.