IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Parallel dedicated hardware devices for heterogeneous computations
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
OPERA: a toolbox for loop parallelization
Proceedings of the First IFIP TC10 International Workshop on Software Engineering for Parallel and Distributed Systems
NAPA C: Compiling for a Hybrid RISC/FPGA Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
The NAPA Adaptive Processing Architecture
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Hi-index | 0.00 |
FPGAs allow the implementation of very complex designs (~1million of gates); they are good candidates to host special purpose systems designed to boost conventional computing architectures. Several computationally intensive algorithms are poorly supported by standard computing architectures, so the design of dedicated devices implementing the intensive parts of such algorithms could significantly speedup the overall performances. (Re-)programmability, allowing the reusing of the same chip for different applications and avoiding the costly and cumbersome design of ASIC systems, is a key issue for the design of specialized computing architectures. Further crucial factors for the success of FPGA based coprocessors are both the possibility of achieving significantly larger performances than those attainable with conventional processors and the ability to produce a working prototype in very short times. This work presents the results achieved in the HADES (HArdware DEsign in Scientific applications) project, aimed at automatically extracting parallelism from affine iterative algorithms and at generating the synthesizable VHDL which describes the parallelized version of the algorithm. In the paper, along with the global HADES design flow, we present two cases, from the signal processing and the proteomic domains, in which FPGA based designs allowed to significantly increase the overall system performances. Thanks to the nearly global automation of all the steps of the design flow, in both cases, a working prototype has been realized in one working week.