Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Astronomical real-time streaming signal processing on a Blue Gene/L supercomputer
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Efficient computation of sum-products on GPUs through software-managed cache
Proceedings of the 22nd annual international conference on Supercomputing
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
Programming the Intel 80-core network-on-a-chip terascale processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Building high-resolution sky images using the Cell/B.E.
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Evaluating multi-core platforms for HPC data-intensive kernels
Proceedings of the 6th ACM conference on Computing frontiers
The LOFAR correlator: implementation and performance analysis
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
GPU-based parallel householder bidiagonalization
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
An OpenCL framework for heterogeneous multicores with local memory
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallel application characterization with quantitative metrics
Concurrency and Computation: Practice & Experience
An efficient work-distribution strategy for gridding radio-telescope data on GPUs
Proceedings of the 26th ACM international conference on Supercomputing
Adaptive Real-Time Imaging Synthesis Telescopes
International Journal of High Performance Computing Applications
Accelerating radio astronomy cross-correlation with graphics processing units
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware. This is done to increase flexibility and to reduce development efforts. Examples include e-VLBI and LOFAR. In this paper, we evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general. The results show that the processing power and memory bandwidth of current GPUs are highly imbalanced for correlation purposes. While the production correlator on the Blue Gene/P achieves a superb 96% of the theoretical peak performance, this is only 14% on ATI GPUs, and 26% on NVIDIA GPUs. The Cell/B.E. processor, in contrast, achieves an excellent 92%. We found that the Cell/B.E. is also the most energy-efficient solution, it runs the correlator 5-7 times more energy efficiently than the Blue Gene/P. The research presented is an important pathfinder for next-generation telescopes.