A density-based clustering structure mining algorithm for data streams

  • Authors:
  • Huan Wang;Yanwei Yu;Qin Wang;Yadong Wan

  • Affiliations:
  • University of Science and Technology Beijing, Haidian District, Beijing, China;University of Science and Technology Beijing, Haidian District, Beijing, China;University of Science and Technology Beijing, Haidian District, Beijing, China;University of Science and Technology Beijing, Haidian District, Beijing, China

  • Venue:
  • Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today, advances in hardware and storage techniques demand for automatically data mining on data streams. Clustering analysis is an importance tool on data streams mining. Though density-based clustering algorithms on data streams now could discover clusters of arbitrary shapes, their effectiveness are depended on parameters settings. Also global parameters used in these algorithms limit their ability in discovering overlapping clusters. In this paper, we propose a novel density-based clustering structure mining algorithm for data streams---OPCluStream. It could adaptively discover clusters of arbitrary shapes and overlapping clusters. Satisfying one-pass constraint, OPCluStream uses a tree topology to index points on which points link to other related ones using pointers directionally. This tree topology records relationships among points, which represent clustering results including a broad range of Eps settings and could discover clusters through a transformation to clustering structure. Clustering structure is equivalent to the index structure and convenient to be used. In addition, OPCluStream has a high efficiency on clustering since a usage of tree topology in points' index and a designed limited computing area when new points added to data streams. A number of experiments on synthetic and real data sets illustrate the effectiveness, efficiency and insights provided by our method.