Clustering data stream by a sub-window approach using DCA

  • Authors:
  • Minh Thuy Ta;Hoai An Le Thi;Lydia Boudjeloud-Assala

  • Affiliations:
  • Laboratory of Theoretical and Applied Computer Science (LITA) UFR MIM, University of Lorraine, Metz, France;Laboratory of Theoretical and Applied Computer Science (LITA) UFR MIM, University of Lorraine, Metz, France;Laboratory of Theoretical and Applied Computer Science (LITA) UFR MIM, University of Lorraine, Metz, France

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data stream is one emerging topic of data mining, it concerns many applications involving large and temporal data sets such as telephone records data, banking data, multimedia data,…For mining of such data, one crucial strategy is analysis of packet data. In this paper, we are interested in an exploratory analysis of strategies for clustering data stream based on a sub-window approach and an efficient clustering algorithm called DCA (Difference of Convex functions Algorithm). Our approach consists of separating the data on different sub-windows and then apply a DCA clustering algorithm on each sub-window. Two clustering strategies are investigated: global clustering (on the whole data set) and independent local clustering (i.e. clustering independently on each sub-window). Our aims are study: (1) the efficiency of the independent local clustering, and (2) the adequation of local clustering and global clustering based on the same DCA clustering algorithm. Comparative experiments with clustering data stream using K-Means, a standard clustering method, on different data sets are presented.