Sparse Matrix-Vector multiplication on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Designing Scalable FPGA-Based Reduction Circuits Using Pipelined Floating-Point Cores
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
H-SIMD Machine: Configurable Parallel Computing for Matrix Multiplication
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
High Performance Linear Algebra Operations on Reconfigurable Systems
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Embedded floating-point units in FPGAs
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Scalable Hybrid Designs for Linear Algebra on Reconfigurable Computing Systems
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Architectures and APIs: assessing requirements for delivering FPGA performance to applications
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimized high-order finite difference wave equations modeling on reconfigurable computing platform
Microprocessors & Microsystems
Automatic mapping of nested loops to FPGAS
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
IEEE Transactions on Parallel and Distributed Systems
Examining the viability of FPGA supercomputing
EURASIP Journal on Embedded Systems
High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
IEEE Transactions on Parallel and Distributed Systems
Novel hardware-based approaches for intrusion detection
ICCOM'05 Proceedings of the 9th WSEAS International Conference on Communications
Architectural modifications to enhance the floating-point performance of FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 21st annual symposium on Integrated circuits and system design
FPGA Acceleration of RankBoost in Web Search Engines
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
From Silicon to Science: The Long Road to Production Reconfigurable Supercomputing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Parallel implementation of Cholesky LLT-algorithm in FPGA-based processor
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Layout aware optimization of high speed fixed coefficient FIR filters for FPGAs
International Journal of Reconfigurable Computing
Parallel FPGA-based all-pairs shortest-paths in a directed graph
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploration of heterogeneous FPGA architectures
International Journal of Reconfigurable Computing - Special issue on selected papers from the international workshop on reconfigurable communication-centric systems on chips (ReCoSoC' 2010)
Genetic Programming and Evolvable Machines
FPGA implementation of the conjugate gradient method
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
An FPGA-Based parallel accelerator for matrix multiplications in the newton-raphson method
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
High-Speed Reconfigurable Parallel System to Design Good Error Correcting Codes in Communications
Journal of Signal Processing Systems
A fused hybrid floating-point and fixed-point dot-product for FPGAs
ARC'10 Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and Applications
A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Field programmable gate arrays (FPGAs) have long been an attractive alternative to microprocessors for computing tasks - as long as floating-point arithmetic is not required. Fueled by the advance of Moore's Law, FPGAs are rapidly reaching sufficient densities to enhance peak floating-point performance as well. The question, however, is how much of this peak performance can be sustained. This paper examines three of the basic linear algebra sub-routine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. A comparison of microprocessors, FPGAs, and Reconfigurable Computing platforms is performed for each operation. The analysis highlights the amount of memory bandwidth and internal storage needed to sustain peak performance with FPGAs. This analysis considers the historical context of the last six years and is extrapolated for the next six years.