Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors

Authors:
Jongsoo Park;Ganesh Bikshandi;Karthikeyan Vaidyanathan;Ping Tak Peter Tang;Pradeep Dubey;Daehyun Kim
Affiliations:
Parallel Computing Lab;Parallel Computing Lab;Parallel Computing Lab;Software and Service Group, Intel Corporation;Parallel Computing Lab;Parallel Computing Lab
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 16
Cited 0

FFTs in external or hierarchical memory

The Journal of Supercomputing
Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A parallel 1-D FFT algorithm for the Hitachi SR8000

Parallel Computing
High performance discrete Fourier transforms on graphics processors

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Large-scale FFT on GPU clusters

Proceedings of the 24th ACM International Conference on Supercomputing
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
The Tofu Interconnect

HOTI '11 Proceedings of the 2011 IEEE 19th Annual Symposium on High Performance Interconnects
An Implementation of Parallel 1-D FFT on the K Computer

HPCC '12 Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems
Efficient backprojection-based synthetic aperture radar computation with many-core processors

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for low-communication 1-D FFT

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor

IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper demonstrates the first tera-scale performance of Intel® Xeon Phi™ coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed optimizations, we break the tera-flop mark on a mere 64 nodes of Xeon Phi and reach 6.7 TFLOPS with 512 nodes, which is 1.5x than achievable on a same number of Intel® Xeon® nodes. It is a challenge to fully utilize the compute capability presented by many-core wide-vector processors for bandwidth-bound FFT computation. We leverage a new algorithm, Segment-of-Interest FFT, with low inter-node communication cost, and aggressively optimize data movements in node-local computations, exploiting caches. Our coordination of low communication algorithm and massively parallel architecture for scalable performance is not limited to running FFT on Xeon Phi; it can serve as a reference for other bandwidth-bound computations and for emerging HPC systems that are increasingly communication limited.