Improving performance of codes with large/irregular stride memory access patterns via high performance reconfigurable computers

Authors:
Khalid H. Abed;Gerald R. Morris
Affiliations:
Jackson State University, School of Engineering, Department of Computer Engineering, 1400 J.R. Lynch Street, Jackson, MS 39217, United States;Research and Development Center, Scientific Computing Research Center, 3909 Halls Ferry Road, Vicksburg, MS 39180, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 16
Cited 0

Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets

ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Sparse Matrix Computations on Reconfigurable Hardware

Computer
Maxwell - a 64 FPGA Supercomputer

AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer

Journal of Parallel and Distributed Computing
Organization of computer systems: the fixed plus variable structure computer

IRE-AIEE-ACM '60 (Western) Papers presented at the May 3-5, 1960, western joint IRE-AIEE-ACM computer conference
Validity of the single processor approach to achieving large scale computing capabilities

AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
25 microchips that shook the world

IEEE Spectrum
Improving Performance of Codes with Large/Irregular Stride Memory Access Patterns via High Performance Reconfigurable Computers

HPCMP-UGC '09 Proceedings of the 2009 DoD High Performance Computing Modernization Program Users Group Conference
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)
Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer

HPCMP-UGC '10 Proceedings of the 2010 DoD High Performance Computing Modernization Program Users Group Conference
Mapping Hierarchical Multiple File VHDL Kernels onto an SRC-7 High Performance Reconfigurable Computer

HPCMP-UGC '10 Proceedings of the 2010 DoD High Performance Computing Modernization Program Users Group Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Codes that have large-stride/irregular-stride (L/I) memory access patterns, e.g., sparse matrix and linked list codes, often perform poorly on mainstream clusters because of the general purpose processor (GPP) memory hierarchy. High performance reconfigurable computers (HPRC) contain both GPPs and field programmable gate arrays (FPGAs) connected via a high-speed network. In this research, simple 64-bit floating-point codes are used to illustrate the runtime performance impact of L/I memory accesses in both software-only and FPGA-augmented codes and to assess the benefits of mapping L/I-type codes onto HPRCs. The experiments documented herein reveal that large-stride software-only codes experience severe performance degradation. In contrast, large-stride FPGA-augmented codes experience minimal performance degradation. For experiments with large data sizes, the unit-stride FPGA-augmented code ran about two times slower than software. On the other hand, the large-stride FPGA-augmented code ran faster than software for all the larger data sizes. The largest showed a 17-fold runtime speedup.