Performances of parallel clustering algorithm for categorical and mixed data

  • Authors:
  • Nguyen Thi Minh Hai;Horiguchi Susumu

  • Affiliations:
  • Japan Advanced Institute of Science and Technology, Japan;Graduate School of Inf. Science, Tohoku University, Sendai, Japan

  • Venue:
  • PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a fundamental and important technique in image processing, pattern recorgnition, data compression, etc. However, most recent clustering algorithms cannot deal with large, complex databases and do not always achieve high clustering results. This paper proposes a parallel clustering algorithm for categorical and mixed data which can overcome the above problems. Our contributions are: (1) improving the k-sets algorithm [3] to achieve highly accurate clustering results; and (2) applying parallel techniques to the improved approach to achieve a parallel algorithm. Experiments on a CRAY T3E show that the proposed algorithm can achieve higher accuracy than previous attempts and can reduce processing time; thus, it is practical for use with very large and complex databases.