Squeezer: an efficient algorithm for clustering categorical data

  • Authors:
  • He Zengyou;Xu Xiaofei;Deng Shengchun

  • Affiliations:
  • Department of Computer Science and Engineering, Harbin Institute of Technology Harbin 150001, P.R. China;Department of Computer Science and Engineering, Harbin Institute of Technology Harbin 150001, P.R. China;Department of Computer Science and Engineering, Harbin Institute of Technology Harbin 150001, P.R. China

  • Venue:
  • Journal of Computer Science and Technology
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents a new efficient algorithm for clustering categorical data, Squeezer, which can produce high quality clustering results and at the same time deserve good scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning t to an existing cluster (initially none), or creating t as a new cluster, which is determined by the similarities between t and clusters. Due to its characteristics, the proposed algorithm is extremely suitable for clustering data streams, where given a sequence of points, the objective is to maintain consistently good clustering of the sequence so far, using a small amount of memory and time. Outliers can also be handled efficiently and directly in Squeezer. Experimental results on real-life and synthetic datasets verify the superiority of Squeezer.