DCF: an efficient data stream clustering framework for streaming applications

  • Authors:
  • Kyungmin Cho;Sungjae Jo;Hyukjae Jang;Su Myeon Kim;Junehwa Song

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST);Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST);Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST);Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST);Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST)

  • Venue:
  • DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Streaming applications, such as environment monitoring and vehicle location tracking require handling high volumes of continuously arriving data and sudden fluctuations in these volumes while efficiently supporting multi-dimensional historical queries. The use of the traditional database management systems is inappropriate because they require excessive number of disk I/O in continuously updating massive data streams. In this paper, we propose DCF (Data Stream Clustering Framework), a novel framework that supports efficient data stream archiving for streaming applications. DCF can reduce a great amount of disk I/O in the storage system by grouping incoming data into clusters and storing them instead of raw data elements. In addition, even when there is a temporary fluctuation in the amount of incoming data, it can stably support storing all incoming raw data by controlling the cluster size. Our experimental results show that our approach significantly reduces the number of disk accesses in terms of both inserting and retrieving data.