Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
Improved algorithms for hypergraph bipartitioning
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
A Library of Parameterized Floating-Point Modules and Their Use
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
JHDL - An HDL for Reconfigurable Systems
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Automatic performance tuning of sparse matrix kernels
Automatic performance tuning of sparse matrix kernels
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Embedded floating-point units in FPGAs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
IBM Journal of Research and Development - POWER5 and packaging
Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Architectures and APIs: assessing requirements for delivering FPGA performance to applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic
Journal of VLSI Signal Processing Systems
Parameterized floating-point logarithm and exponential functions for FPGAs
Microprocessors & Microsystems
Multivariate Gaussian Random Number Generation Targeting Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Architectural modifications to enhance the floating-point performance of FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computation reuse in domain-specific optimization of signal recognition
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Floating-point divider design for FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An integrated reduction technique for a double precision accumulator
Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Domain-Specific Optimization of Signal Recognition Targeting FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Spatial hardware implementation for sparse graph algorithms in GraphStep
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Optimising memory bandwidth use for matrix-vector multiplication in iterative methods
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
Portable and scalable FPGA-based acceleration of a direct linear system solver
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A scalable approach for automated precision analysis
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Compiled multithreaded data paths on FPGAs for dynamic workloads
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A Multiple-FPGA parallel computing architecture for real-time simulation of soft-object deformation
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, it is not uncommon for microprocessors to yield only 10--20% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single Virtex II 6000-4 and 12 double precision Gflops for 16 Virtex IIs (750Mflops/FPGA).