A grid-based clustering algorithm for high-dimensional data streams

  • Authors:
  • Yansheng Lu;Yufen Sun;Guiping Xu;Gang Liu

  • Affiliations:
  • College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China

  • Venue:
  • ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The three main requirements for clustering data streams on-line are one pass over the data, high processing speed, and consuming a small amount of memory. We propose an algorithm that can fulfill these requirements by introducing an incremental grid data structure to summarize the data streams on-line. In order to deal with high-dimensional problems, the algorithm adopts a simple heuristic method to select a subset of dimensions on which all the operations for clustering are performed. Our performance study with a real network intrusion detection stream data set demonstrates the efficiency and effectiveness of our proposed algorithm.