Efficiently tracing clusters over high-dimensional on-line data streams

  • Authors:
  • Jae Woo Lee;Nam Hun Park;Won Suk Lee

  • Affiliations:
  • Department of Computer Science, Yonsei University, 134 Shinchondong Seodaemungu, Seoul 120-749, Republic of Korea;Department of Computer Science, Yonsei University, 134 Shinchondong Seodaemungu, Seoul 120-749, Republic of Korea;Department of Computer Science, Yonsei University, 134 Shinchondong Seodaemungu, Seoul 120-749, Republic of Korea

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A good clustering method should provide flexible scalability on the number of dimensions as well as the size of a data set. This paper proposes a method of efficiently tracing the clusters of a high-dimensional on-line data stream. While tracing the one-dimensional clusters of each dimension independently, a technique which is similar to frequent itemset mining is employed to find the set of multi-dimensional clusters. By finding a frequently co-occurred set of one-dimensional clusters, it is possible to trace a multi-dimensional rectangular space whose range is defined by the one-dimensional clusters collectively. In order to trace such candidates over a multi-dimensional online data stream, a cluster-statistics tree (CS-Tree) is proposed in this paper. A k-depth node(k=