An algorithm for reduction of operator strength
Communications of the ACM
Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach
Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Elementary Functions: Algorithms and Implementation
Elementary Functions: Algorithms and Implementation
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
Parallel backprojection: a case study in high-performance reconfigurable computing
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Designing and dynamically load balancing hybrid LU for multi/many-core
Computer Science - Research and Development
Convolution backprojection image reconstruction for spotlight mode synthetic aperture radar
IEEE Transactions on Image Processing
Hi-index | 0.00 |
Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar SAR via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm of approximate strength reduction; data movement cost is economized by software locality optimizations facilitated by advanced architecture support; parallelism is fully harnessed in various patterns and granularities. We deliver over 35 billion backprojections per second throughput per compute node on an Intel® Xeon® processor E5-2670-based cluster, equipped with Intel® Xeon Phi™ coprocessors. This corresponds to processing a 3K×3K image within a second using a single node. Our study can be extended to other settings: backprojection is applicable elsewhere including medical imaging, approximate strength reduction is a general code transformation technique, and many-core processors are emerging as a solution to energy-efficient computing.