Adaptive non-linear clustering in data streams

  • Authors:
  • Ankur Jain;Zhihua Zhang;Edward Y. Chang

  • Affiliations:
  • UC Santa Barbara, CA;UC Santa Barbara, CA;UC Santa Barbara, CA

  • Venue:
  • CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data stream clustering has emerged as a challenging and interesting problem over the past few years. Due to the evolving nature, and one-pass restriction imposed by the data stream model, traditional clustering algorithms are inapplicable for stream clustering. This problem becomes even more challenging when the data is high-dimensional and the clusters are not linearly separable in the input space. In this paper, we propose a nonlinear stream clustering algorithm that adapts to the stream's evolutionary changes. Using the kernel methods for dealing with the non-linearity of data separation, we propose a novel 2-tier stream clustering architecture. Tier-1 captures the temporal locality in the stream, by partitioning it into segments, using a kernel-based novelty detection approach. Tier-2 exploits this segment structure to continuously project the streaming data nonlinearly onto a low-dimensional space (LDS), before assigning them to a cluster. We demonstrate the effectiveness of our approach through extensive experimental evaluation on various real-world datasets.