PAM-Blox: High Performance FPGA Design for Adaptive Computing
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Partially Reconfigurable Matrix Multiplication for Area and Time Efficiency on FPGAs
DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
H-SIMD Machine: Configurable Parallel Computing for Matrix Multiplication
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
WSEAS Transactions on Signal Processing
Journal of Systems Architecture: the EUROMICRO Journal
Reconfigurable architecture of systolic array processors for remote sensing applications
SSIP '09/MIV'09 Proceedings of the 9th WSEAS international conference on signal, speech and image processing, and 9th WSEAS international conference on Multimedia, internet & video technologies
C2FPGA-A dependency-timing graph design methodology
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper presents a novel architecture for matrix multiplication optimized to be integrated as a coprocessor unit with embedded processors in modern FPGAs. In contrast with previous proposals that accelerate just the matrix multiplication computation, the coprocessor here proposed has been purposely designed to exploit an efficient communication protocol for the data exchange between it and the host processor that significantly reduces the whole computational time. The complete system formed by a 32-bit RISC processor augmented by the proposed coprocessor unit has been hardware implemented. Such system can be easily used to accelerate matrix multiplication with virtually any matrix sizes. Simulation tests and measurements demonstrate that the system requires a number of clock cycles more than halved, with respect to competitive solutions.