Parallel Spectral Clustering

  • Authors:
  • Yangqiu Song;Wen-Yen Chen;Hongjie Bai;Chih-Jen Lin;Edward Y. Chang

  • Affiliations:
  • Department of Automation, Tsinghua University, Beijing, China and Google Research, , USA/China;Department of Computer Science, University of California, Santa Barbara, USA and Google Research, , USA/China;Google Research, , USA/China;Department of Computer Science, National Taiwan University, Taipie, Taiwan and Google Research, , USA/China;Google Research, , USA/China

  • Venue:
  • ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spectral clustering algorithm has been shown to be more effective in finding clusters than most traditional algorithms. However, spectral clustering suffers from a scalability problem in both memory use and computational time when a dataset size is large. To perform clustering on large datasets, we propose to parallelize both memory use and computation on distributed computers. Through an empirical study on a large document dataset of 193,844 data instances and a large photo dataset of 637,137, we demonstrate that our parallel algorithm can effectively alleviate the scalability problem.