Cluster analysis of massive datasets in astronomy

  • Authors:
  • Woncheol Jang;Martin Hendry

  • Affiliations:
  • Department of Epidemiology and Biostatistics, University of Georgia, Athens, USA 30602;Department of Physics and Astronomy, University of Glasgow, Glasgow, UK G12 8QQ

  • Venue:
  • Statistics and Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set S c 驴{fc} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367---382, 2000; Comput. Stat. Data Anal. 36, 441---459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.