The horizon supercomputing system: architecture and software
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Exploiting heterogeneous parallelism on a multithreaded multiprocessor
ICS '92 Proceedings of the 6th international conference on Supercomputing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Multi-processor performance on the Tera MTA
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Sparse Matrix-Vector multiplication on FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Floating-point sparse matrix-vector multiply for FPGAs
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Proceedings of the 2nd conference on Computing frontiers
Sparse Matrix-Vector Multiplication Design on FPGAs
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
DSD '09 Proceedings of the 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools
Designing Modular Hardware Accelerators in C with ROCCC 2.0
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
FPGA and GPU implementation of large scale SpMV
SASP '10 Proceedings of the 2010 IEEE 8th Symposium on Application Specific Processors (SASP)
A Sparse Matrix Personality for the Convey HC-1
FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
Exploring irregular memory accesses on FPGAs
Proceedings of the first workshop on Irregular applications: architectures and algorithm
Automatically tuning sparse matrix-vector multiplication for GPU architectures
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Hardware supported multithreading can mask memory latency by switching the execution to ready threads, which is particularly effective on irregular applications. FPGAs provide an opportunity to have multithreaded data paths customized toeach individual application. In this paper we describe the compiler generation of these hardware structures from a C subset targeting a Convey HC-2ex machine. We describe how this compilation approach differs from other C to HDL compilers. We use the compiler to generate a multithreaded sparse matrix vector multiplication kernel and compare its performance to existing FPGA, and highly optimized software implementations.