Clustering billions of data points using GPUs

  • Authors:
  • Ren Wu;Bin Zhang;Meichun Hsu

  • Affiliations:
  • Hewlett Packard Company, Palo Alto, CA, USA;Hewlett Packard Company, Palo Alto, CA, USA;Hewlett Packard Company, Palo Alto, CA, USA

  • Venue:
  • Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we report our research on using GPUs to accelerate clustering of very large data sets, which are common in today's real world applications. While many published works have shown that GPUs can be used to accelerate various general purpose applications with respectable performance gains, few attempts have been made to tackle very large problems. Our goal here is to investigate if GPUs can be useful accelerators even with very large data sets that cannot fit into GPU's onboard memory. Using a popular clustering algorithm, K-Means, as an example, our results have been very positive. On a data set with a billion data points, our GPU-accelerated implementation achieved an order of magnitude performance gain over a highly optimized CPU-only version running on 8 cores, and more than two orders of magnitude gain over a popular benchmark, MineBench, running on a single core.