Accelerating radio astronomy cross-correlation with graphics processing units

  • Authors:
  • M.A. Clark;Pc La Plante;L.J. Greenhill

  • Affiliations:
  • NVIDIA Corporation, CA, USA, The work was done while the author was at Harvard-Smithsonian Center for Astrophysics, Cambridge, MA, USA;Loyola University Maryland, Baltimore, MD, USA;Harvard-Smithsonian Center for Astrophysics, Cambridge, MA, USA

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from 'large-N' arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implemented efficiently on NVIDIA's Fermi architecture, sustaining up to 79% of the peak single-precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared with application-specific integrated circuit (ASIC) and field programmable gate array (FPGA) implementations have the potential to greatly shorten the cycle of correlator development and deployment, for cases where some power-consumption penalty can be tolerated.