A clustering algorithm based on matrix over high dimensional data stream

  • Authors:
  • Guibin Hou;Ruixia Yao;Jiadong Ren;Changzhen Hu

  • Affiliations:
  • College of Information Science and Engineering Yanshan University, Qinhuangdao, China;College of Information Science and Engineering Yanshan University, Qinhuangdao, China;College of Information Science and Engineering Yanshan University, Qinhuangdao, China and School of Computer Science and Technology Beijing Institute of Technology, Beijing, China;Technology Beijing Institute of Technology, Beijing, China

  • Venue:
  • WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.